learning_ai_invt_trdg/backend/ARCHITECTURE_RISK_ANALYSIS.md

9.0 KiB

Architecture Risk Analysis

Date: 2026-02-16
Scope: bytelyst-trading-bot-service + bytelyst-trading-dashboard-web
Mode: Risk analysis only (no redesign, no implementation proposal)

Phase Roadmap Tracking

Phase 1: Multi-Tenant Isolation Hardening

Phase 2: Restart Durability & Snapshot

Architecture Risk Analysis

  • Mitigated in code (pending environment rollout): Multi-tenant state exposure risk.
    • Runtime REST responses and websocket emissions are now tenant-scoped by authenticated user.
    • Evidence (fix commits): a91e253, 089af51.
  • Critical: Snapshot architecture inconsistency for restart durability.
    • Runtime writes snapshots with user_id='system_backup' while schema requires UUID FK.
    • Startup flow does not actually restore from DB snapshot path.
    • Evidence: bytelyst-trading-bot-service/src/services/apiServer.ts:375, bytelyst-trading-bot-service/schema/008_schema_gap_backfill.sql:205, bytelyst-trading-bot-service/src/services/apiServer.ts:637, bytelyst-trading-bot-service/src/services/apiServer.ts:642.
  • High: Non-transactional lifecycle persistence across orders/history/runtime maps can drift under partial failures.
    • Evidence: bytelyst-trading-bot-service/src/services/TradeExecutor.ts:597, bytelyst-trading-bot-service/src/services/TradeExecutor.ts:642, bytelyst-trading-bot-service/src/services/TradeExecutor.ts:801, bytelyst-trading-bot-service/src/services/TradeExecutor.ts:920.
  • High: Single-process authority assumption.
    • In-memory maps are primary state authorities, with no distributed coordination boundary.
    • Evidence: bytelyst-trading-bot-service/src/services/TradeExecutor.ts:68, bytelyst-trading-bot-service/src/services/apiServer.ts:191.

Missing Enterprise Components

  • Tenant-scoped state/event partitioning for WebSocket/REST runtime payloads.
  • RLS validation coverage for orders and trade_history is now present in migration + policy checks; environment rollout remains.
    • Evidence (fix commit): d9395b4.
  • Durable structured audit sink for control-plane actions (currently log-line only).
    • Evidence: bytelyst-trading-bot-service/src/services/apiServer.ts:399.
  • Metrics export wiring is not active in API/runtime integration.
    • Evidence: bytelyst-trading-bot-service/src/services/MetricsService.ts:4.

Structural Integrity Risks

  • API profile state caches are write/update only, no explicit delete path in merge maps.
    • Evidence: bytelyst-trading-bot-service/src/services/apiServer.ts:1269, bytelyst-trading-bot-service/src/services/apiServer.ts:1285.
  • Lifecycle UI can become window-biased due to hard query limits, causing false orphan/mismatch interpretations.
    • Evidence: bytelyst-trading-dashboard-web/src/tabs/PositionsTab.tsx:346, bytelyst-trading-dashboard-web/src/tabs/PositionsTab.tsx:397, bytelyst-trading-dashboard-web/src/tabs/PositionsTab.tsx:664.
  • Client-side synthetic lifecycle IDs can diverge from backend trace truth.
    • Evidence: bytelyst-trading-dashboard-web/src/tabs/PositionsTab.tsx:293, bytelyst-trading-dashboard-web/src/tabs/PositionsTab.tsx:297.

Lifecycle Integrity Risks

  • Finalization suppression when entry chain is missing protects integrity but can produce realized PnL visibility gaps.
    • Evidence: bytelyst-trading-bot-service/src/services/TradeExecutor.ts:843.
  • Mixed-side virtual position reconstruction collapses to dominant side, masking opposite residual slices.
    • Evidence: bytelyst-trading-bot-service/src/services/SupabaseService.ts:1254.
  • Exit lifecycle control is symbol-scoped, not trade-scoped, for in-flight transition locking.
    • Evidence: bytelyst-trading-bot-service/src/services/TradeExecutor.ts:76, bytelyst-trading-bot-service/src/services/TradeExecutor.ts:548.

Capital Isolation Risks

  • Capital checks rely on local in-memory open positions and exclude pending intent reservation.
    • Evidence: bytelyst-trading-bot-service/src/index.ts:131, bytelyst-trading-bot-service/src/services/AutoTrader.ts:167, bytelyst-trading-bot-service/src/services/ManualTrader.ts:26.
  • Concurrent profile evaluations per symbol can pass guard checks before committed state convergence.
    • Evidence: bytelyst-trading-bot-service/src/index.ts:523, bytelyst-trading-bot-service/src/index.ts:530.

Exchange Reconciliation Risks

  • Reconciliation parity checks only recent subset (limit=25 per profile), allowing long-tail drift persistence.
    • Evidence: bytelyst-trading-bot-service/src/index.ts:845.
  • Pending-order recovery scope (pending_new) is narrower than stale-order sync scope (pending_new|pending|accepted|new).
    • Evidence: bytelyst-trading-bot-service/src/services/SupabaseService.ts:655, bytelyst-trading-bot-service/src/services/SupabaseService.ts:597.
  • Runtime pending order broadcast normalizes to pending_new, which can transiently differ from DB/exchange truth.
    • Evidence: bytelyst-trading-bot-service/src/services/TradeExecutor.ts:957.

Restart Recovery Risks

  • Critical idempotency and lifecycle coordination maps are process-local and reset on restart.
    • Evidence: bytelyst-trading-bot-service/src/services/TradeExecutor.ts:73, bytelyst-trading-bot-service/src/services/TradeExecutor.ts:74, bytelyst-trading-bot-service/src/services/TradeExecutor.ts:75, bytelyst-trading-bot-service/src/services/TradeExecutor.ts:76.
  • Startup restore sequence prefers local file; DB snapshot restore path is intentionally not used.
    • Evidence: bytelyst-trading-bot-service/src/services/apiServer.ts:621, bytelyst-trading-bot-service/src/services/apiServer.ts:642.

Concurrency and Locking Risks

  • Loop guards are local booleans only; no shared lock domain across services/instances.
    • Evidence: bytelyst-trading-bot-service/src/index.ts:462, bytelyst-trading-bot-service/src/index.ts:463, bytelyst-trading-bot-service/src/services/tradeMonitor.ts:10, bytelyst-trading-bot-service/src/services/OrderStatusSyncService.ts:47.
  • Profile-scale fan-out creates one monitor and one order-sync worker per profile plus orphan sync workers, expanding race and rate-pressure surfaces.
    • Evidence: bytelyst-trading-bot-service/src/index.ts:196, bytelyst-trading-bot-service/src/index.ts:198, bytelyst-trading-bot-service/src/index.ts:207.

Observability Gaps

  • Readiness signal only tracks loop recency and exchange connectivity flag; lacks explicit monitor/order-sync/profile-sync liveness SLO dimensions.
    • Evidence: bytelyst-trading-bot-service/src/services/apiServer.ts:352.
  • No persistent operational audit table for lifecycle control actions.
    • Evidence: bytelyst-trading-bot-service/src/services/apiServer.ts:399.
  • Metrics service exists but is not exposed as runtime telemetry endpoint.
    • Evidence: bytelyst-trading-bot-service/src/services/MetricsService.ts:114.

Production Readiness Score

  • Score: 58 / 100
  • Rationale: Core lifecycle execution logic is materially hardened, but enterprise-grade readiness is constrained by tenant data exposure, restart durability gaps, local-process locking assumptions, and incomplete production observability integration.