9.0 KiB
9.0 KiB
Architecture Risk Analysis
Date: 2026-02-16
Scope: bytelyst-trading-bot-service + bytelyst-trading-dashboard-web
Mode: Risk analysis only (no redesign, no implementation proposal)
Phase Roadmap Tracking
Phase 1: Multi-Tenant Isolation Hardening
- Partition runtime REST state by authenticated
user_idfor:/api/status/api/alerts/api/symbol/:symbol- Commit: https://github.com/saravanakumardb/bytelyst-trading-bot-service/commit/a91e253
- Remove global runtime websocket broadcast surfaces and emit tenant-scoped updates only.
- Prevent stale profile cache leakage on profile removal (
unregisterManualTradercleanup path). - Enforce RLS coverage for
ordersandtrade_historyin schema + CI policy verification. - Add validation proving tenant isolation (
user Acannot accessuser Bprofile/trade runtime data). - Run
schema/009_tenant_rls_orders_trade_history.sqlon each target environment (dev/staging/prod) and verify policy existence in Supabase.
Phase 2: Restart Durability & Snapshot
- Create durable
bot_state_snapshotstable withauth.usersFK and RLS guard, then verify presence via policy gate. - Replace file-only state restore/persistence with
SupabaseService.saveBotStateSnapshot/loadLatestBotStateSnapshotbacked by the new table and asynchronousApiServersnapshot flow. - Rehydrate
TradeExecutoron startup by pulling open orders from Supabase/exchange, rebuilding lifecycle tracking, and validating duplicates via DB queries instead of in-memory maps. - Apply the new snapshot migration and ensure each environment records a valid
snapshot_user_id(or derived owner) before relying on DB restore.
Architecture Risk Analysis
- Mitigated in code (pending environment rollout): Multi-tenant state exposure risk.
- Runtime REST responses and websocket emissions are now tenant-scoped by authenticated user.
- Evidence (fix commits):
a91e253,089af51.
- Critical: Snapshot architecture inconsistency for restart durability.
- Runtime writes snapshots with
user_id='system_backup'while schema requires UUID FK. - Startup flow does not actually restore from DB snapshot path.
- Evidence:
bytelyst-trading-bot-service/src/services/apiServer.ts:375,bytelyst-trading-bot-service/schema/008_schema_gap_backfill.sql:205,bytelyst-trading-bot-service/src/services/apiServer.ts:637,bytelyst-trading-bot-service/src/services/apiServer.ts:642.
- Runtime writes snapshots with
- High: Non-transactional lifecycle persistence across orders/history/runtime maps can drift under partial failures.
- Evidence:
bytelyst-trading-bot-service/src/services/TradeExecutor.ts:597,bytelyst-trading-bot-service/src/services/TradeExecutor.ts:642,bytelyst-trading-bot-service/src/services/TradeExecutor.ts:801,bytelyst-trading-bot-service/src/services/TradeExecutor.ts:920.
- Evidence:
- High: Single-process authority assumption.
- In-memory maps are primary state authorities, with no distributed coordination boundary.
- Evidence:
bytelyst-trading-bot-service/src/services/TradeExecutor.ts:68,bytelyst-trading-bot-service/src/services/apiServer.ts:191.
Missing Enterprise Components
- Tenant-scoped state/event partitioning for WebSocket/REST runtime payloads.
- RLS validation coverage for
ordersandtrade_historyis now present in migration + policy checks; environment rollout remains.- Evidence (fix commit):
d9395b4.
- Evidence (fix commit):
- Durable structured audit sink for control-plane actions (currently log-line only).
- Evidence:
bytelyst-trading-bot-service/src/services/apiServer.ts:399.
- Evidence:
- Metrics export wiring is not active in API/runtime integration.
- Evidence:
bytelyst-trading-bot-service/src/services/MetricsService.ts:4.
- Evidence:
Structural Integrity Risks
- API profile state caches are write/update only, no explicit delete path in merge maps.
- Evidence:
bytelyst-trading-bot-service/src/services/apiServer.ts:1269,bytelyst-trading-bot-service/src/services/apiServer.ts:1285.
- Evidence:
- Lifecycle UI can become window-biased due to hard query limits, causing false orphan/mismatch interpretations.
- Evidence:
bytelyst-trading-dashboard-web/src/tabs/PositionsTab.tsx:346,bytelyst-trading-dashboard-web/src/tabs/PositionsTab.tsx:397,bytelyst-trading-dashboard-web/src/tabs/PositionsTab.tsx:664.
- Evidence:
- Client-side synthetic lifecycle IDs can diverge from backend trace truth.
- Evidence:
bytelyst-trading-dashboard-web/src/tabs/PositionsTab.tsx:293,bytelyst-trading-dashboard-web/src/tabs/PositionsTab.tsx:297.
- Evidence:
Lifecycle Integrity Risks
- Finalization suppression when entry chain is missing protects integrity but can produce realized PnL visibility gaps.
- Evidence:
bytelyst-trading-bot-service/src/services/TradeExecutor.ts:843.
- Evidence:
- Mixed-side virtual position reconstruction collapses to dominant side, masking opposite residual slices.
- Evidence:
bytelyst-trading-bot-service/src/services/SupabaseService.ts:1254.
- Evidence:
- Exit lifecycle control is symbol-scoped, not trade-scoped, for in-flight transition locking.
- Evidence:
bytelyst-trading-bot-service/src/services/TradeExecutor.ts:76,bytelyst-trading-bot-service/src/services/TradeExecutor.ts:548.
- Evidence:
Capital Isolation Risks
- Capital checks rely on local in-memory open positions and exclude pending intent reservation.
- Evidence:
bytelyst-trading-bot-service/src/index.ts:131,bytelyst-trading-bot-service/src/services/AutoTrader.ts:167,bytelyst-trading-bot-service/src/services/ManualTrader.ts:26.
- Evidence:
- Concurrent profile evaluations per symbol can pass guard checks before committed state convergence.
- Evidence:
bytelyst-trading-bot-service/src/index.ts:523,bytelyst-trading-bot-service/src/index.ts:530.
- Evidence:
Exchange Reconciliation Risks
- Reconciliation parity checks only recent subset (
limit=25per profile), allowing long-tail drift persistence.- Evidence:
bytelyst-trading-bot-service/src/index.ts:845.
- Evidence:
- Pending-order recovery scope (
pending_new) is narrower than stale-order sync scope (pending_new|pending|accepted|new).- Evidence:
bytelyst-trading-bot-service/src/services/SupabaseService.ts:655,bytelyst-trading-bot-service/src/services/SupabaseService.ts:597.
- Evidence:
- Runtime pending order broadcast normalizes to
pending_new, which can transiently differ from DB/exchange truth.- Evidence:
bytelyst-trading-bot-service/src/services/TradeExecutor.ts:957.
- Evidence:
Restart Recovery Risks
- Critical idempotency and lifecycle coordination maps are process-local and reset on restart.
- Evidence:
bytelyst-trading-bot-service/src/services/TradeExecutor.ts:73,bytelyst-trading-bot-service/src/services/TradeExecutor.ts:74,bytelyst-trading-bot-service/src/services/TradeExecutor.ts:75,bytelyst-trading-bot-service/src/services/TradeExecutor.ts:76.
- Evidence:
- Startup restore sequence prefers local file; DB snapshot restore path is intentionally not used.
- Evidence:
bytelyst-trading-bot-service/src/services/apiServer.ts:621,bytelyst-trading-bot-service/src/services/apiServer.ts:642.
- Evidence:
Concurrency and Locking Risks
- Loop guards are local booleans only; no shared lock domain across services/instances.
- Evidence:
bytelyst-trading-bot-service/src/index.ts:462,bytelyst-trading-bot-service/src/index.ts:463,bytelyst-trading-bot-service/src/services/tradeMonitor.ts:10,bytelyst-trading-bot-service/src/services/OrderStatusSyncService.ts:47.
- Evidence:
- Profile-scale fan-out creates one monitor and one order-sync worker per profile plus orphan sync workers, expanding race and rate-pressure surfaces.
- Evidence:
bytelyst-trading-bot-service/src/index.ts:196,bytelyst-trading-bot-service/src/index.ts:198,bytelyst-trading-bot-service/src/index.ts:207.
- Evidence:
Observability Gaps
- Readiness signal only tracks loop recency and exchange connectivity flag; lacks explicit monitor/order-sync/profile-sync liveness SLO dimensions.
- Evidence:
bytelyst-trading-bot-service/src/services/apiServer.ts:352.
- Evidence:
- No persistent operational audit table for lifecycle control actions.
- Evidence:
bytelyst-trading-bot-service/src/services/apiServer.ts:399.
- Evidence:
- Metrics service exists but is not exposed as runtime telemetry endpoint.
- Evidence:
bytelyst-trading-bot-service/src/services/MetricsService.ts:114.
- Evidence:
Production Readiness Score
- Score: 58 / 100
- Rationale: Core lifecycle execution logic is materially hardened, but enterprise-grade readiness is constrained by tenant data exposure, restart durability gaps, local-process locking assumptions, and incomplete production observability integration.