learning_ai_invt_trdg/backend/TRADING_CONTROL_PERSISTENCE.md

350 lines
11 KiB
Markdown

# Trading Control State Persistence
## Where is the Pause/Resume State Stored?
The trading control state (PAUSED or RUNNING) is persisted in **three locations** to ensure reliability and recovery:
---
## 1. In-Memory (Primary Runtime State)
**Location**: `HealthTracker` singleton in `healthTracker.ts`
```typescript
// bytelyst-trading-bot-service/src/services/healthTracker.ts
export class HealthTracker {
private tradingControl: TradingControlSnapshot = {
mode: 'RUNNING',
lastChangedBy: 'system',
lastChangedAt: Date.now()
};
public isPaused(): boolean {
return this.tradingControl.mode === 'PAUSED';
}
public recordTradingControl(update: TradingControlSnapshot): void {
this.tradingControl = update;
// Triggers persistence to disk and database
}
}
```
**Purpose**: Fast access for enforcement points (AutoTrader, TradeExecutor)
---
## 2. Disk Storage (Local Persistence)
**Location**: `bot_state.json` in the bot service root directory
**File Path**: `c:\Users\sarav\project\bytelyst.ai\trading\bytelyst-trading-bot-service\bot_state.json`
**Structure**:
```json
{
"health": {
"tradingControl": {
"mode": "PAUSED",
"lastChangedBy": "admin@example.com",
"lastChangedAt": 1708200000000,
"reason": "Manual admin pause"
},
"tradingLoopHealthy": true,
"reconciliationLoopHealthy": true,
// ... other health metrics
},
"symbols": { ... },
"orders": [ ... ],
"history": [ ... ]
}
```
**How it's saved**:
```typescript
// bytelyst-trading-bot-service/src/services/apiServer.ts
private saveState(): void {
const snapshot = healthTracker.getSnapshot();
const stateToSave = {
health: snapshot, // Includes tradingControl
symbols: this.state.symbols,
orders: this.state.orders,
history: this.state.history,
settings: this.state.settings
};
fs.writeFileSync(
path.join(process.cwd(), 'bot_state.json'),
JSON.stringify(stateToSave, null, 2)
);
}
```
**When it's saved**:
- After every pause/resume action
- Periodically (every 30 seconds)
- On graceful shutdown
**Purpose**: Survive bot restarts, local backup
---
### 3. Database Storage (Cloud Backup)
- **Location**: Supabase `bot_state_snapshots` table
- **Purpose**: Multi-instance recovery, cloud backup, and audit trail
- **Updated**: Throttled writes based on `DB_SNAPSHOT_INTERVAL_MS` (Default: 5 mins)
- `user_id` (UUID, foreign key to users)
- `state` (JSONB) - Contains full bot state including tradingControl
- `created_at` (timestamp)
**How it's saved**:
```typescript
// bytelyst-trading-bot-service/src/services/apiServer.ts
private async persistSnapshotToDb(): Promise<void> {
if (!config.ENABLE_DB_SNAPSHOTS) return;
const now = Date.now();
const elapsed = now - this.lastSnapshotWriteAt;
if (elapsed < config.DB_SNAPSHOT_INTERVAL_MS) {
// Throttled...
return;
}
const stateToSave = this.getPersistableState();
await supabaseService.saveBotStateSnapshot(ownerId, stateToSave);
this.lastSnapshotWriteAt = Date.now();
}
```
**When it's saved**:
- **Throttled Periodically**: Every `DB_SNAPSHOT_INTERVAL_MS` (Default: 300,000ms / 5 minutes)
- **On Startup/Shutdown**: Forced sync if enabled
- **Note**: The manual "Pause/Resume" action triggers a request to save, but the actual DB write is still subject to the throttle to prevent IOPS spikes.
**Purpose**:
- Multi-instance recovery
- Cloud backup
- Audit trail
---
## State Recovery on Bot Restart
When the bot starts, it loads the state in this order:
```typescript
// bytelyst-trading-bot-service/src/services/apiServer.ts
public async loadState(): Promise<void> {
try {
// 1. Try to load from Supabase (preferred)
const dbSnapshot = await supabaseService.getLatestBotSnapshot();
if (dbSnapshot?.snapshot_data?.health?.tradingControl) {
healthTracker.recordTradingControl(
dbSnapshot.snapshot_data.health.tradingControl
);
logger.info('[LoadState] Restored trading control from Supabase:',
dbSnapshot.snapshot_data.health.tradingControl.mode);
return;
}
} catch (err) {
logger.warn('[LoadState] Failed to load from Supabase, trying disk');
}
try {
// 2. Fallback to disk (bot_state.json)
const filePath = path.join(process.cwd(), 'bot_state.json');
if (fs.existsSync(filePath)) {
const fileData = JSON.parse(fs.readFileSync(filePath, 'utf-8'));
if (fileData.health?.tradingControl) {
healthTracker.recordTradingControl(fileData.health.tradingControl);
logger.info('[LoadState] Restored trading control from disk:',
fileData.health.tradingControl.mode);
return;
}
}
} catch (err) {
logger.warn('[LoadState] Failed to load from disk');
}
// 3. Default to RUNNING if no state found
logger.info('[LoadState] No saved state found, defaulting to RUNNING');
healthTracker.recordTradingControl({
mode: 'RUNNING',
lastChangedBy: 'system',
lastChangedAt: Date.now(),
reason: 'Bot startup - no previous state'
});
}
```
---
## Persistence Flow Diagram
```
Admin clicks "Pause" button
POST /internal/trading/pause
healthTracker.recordTradingControl({ mode: 'PAUSED' })
├─────────────────────────────────────────┐
│ │
▼ ▼
1. In-Memory Update 2. Trigger Persistence
(Immediate) (Async)
│ │
├─ tradingControl.mode = 'PAUSED' ├─ apiServer.saveState()
└─ isPaused() returns true │ └─ Write to bot_state.json
└─ apiServer.persistSnapshotToSupabase()
└─ Upsert to bot_snapshots table
Bot Restart
apiServer.loadState()
├─ Try Supabase (preferred)
│ └─ SELECT * FROM bot_snapshots ORDER BY updated_at DESC LIMIT 1
│ └─ Extract health.tradingControl
├─ Fallback to Disk
│ └─ Read bot_state.json
│ └─ Extract health.tradingControl
└─ Default to RUNNING
└─ If no state found
```
---
## 4. Global Configuration (Neural Persistence Settings)
To prevent database overload while maintaining state safety, the bot includes a synchronization throttling mechanism. These settings are managed via the **Admin Tab > Database Synchronization** panel.
### Settings stored in `bot_config` table:
| Key | Default | Description |
|-----|---------|-------------|
| `ENABLE_DB_SNAPSHOTS` | `true` | When `false`, the bot will not write snapshots to Supabase at all. |
| `DB_SNAPSHOT_INTERVAL_MS` | `300000` | Minimum time (in ms) to wait between database writes. |
### Throttling Logic:
1. The bot saves state to local `bot_state.json` **effectively immediately** (debounced 1.5s).
2. The bot checks if `ENABLE_DB_SNAPSHOTS` is true.
3. The bot checks if enough time has passed since `lastSnapshotWriteAt`.
4. Only if both pass is a write sent to Supabase.
---
## Verification
### Check Current State
**Option 1: API Call**
```bash
curl -H "Authorization: Bearer <token>" \
http://localhost:5000/internal/trading/status
# Response:
{
"mode": "PAUSED",
"lastChangedBy": "admin@example.com",
"lastChangedAt": 1708200000000,
"reason": "Manual admin pause"
}
```
**Option 2: Check bot_state.json**
```bash
# Windows PowerShell
Get-Content "c:\Users\sarav\project\bytelyst.ai\trading\bytelyst-trading-bot-service\bot_state.json" | ConvertFrom-Json | Select-Object -ExpandProperty health | Select-Object -ExpandProperty tradingControl
```
**Option 3: Check Supabase**
```sql
SELECT snapshot_data->'health'->'tradingControl'
FROM bot_snapshots
ORDER BY updated_at DESC
LIMIT 1;
```
**Option 4: Check Logs**
```bash
# Look for these log entries
[Admin] Trading PAUSED by admin@example.com. Reason: Manual admin pause
[Admin] Trading RESUMED by admin@example.com.
[LoadState] Restored trading control from Supabase: PAUSED
```
---
## Important Notes
### Persistence Guarantees
**Immediate**: In-memory state updated instantly
**Durable**: Disk and database writes happen within 1 second
**Recoverable**: State survives bot restarts
**Auditable**: All changes logged with timestamp and user
### Failure Scenarios
| Scenario | Behavior |
|----------|----------|
| Disk write fails | State still in memory, Supabase backup available |
| Supabase write fails | State still in memory and on disk |
| Both writes fail | State in memory, will retry on next save cycle |
| Bot crashes | State recovered from Supabase or disk on restart |
| Supabase unavailable on restart | Falls back to disk (bot_state.json) |
| Both unavailable on restart | Defaults to RUNNING mode |
### State Consistency
- **Single Source of Truth**: In-memory state in HealthTracker
- **Persistence is Async**: Writes don't block trading operations
- **Recovery is Synchronous**: State loaded before trading starts
- **No Race Conditions**: All writes go through HealthTracker singleton
---
## File Locations Summary
| Storage | Location | Purpose |
|---------|----------|---------|
| In-Memory | `healthTracker.ts` singleton | Fast runtime access |
| Disk | `bot_state.json` | Local persistence |
| Database | Supabase `bot_snapshots` table | Cloud backup |
| Logs | `combined.log` | Audit trail |
---
## Code References
### Save State
- **File**: `bytelyst-trading-bot-service/src/services/apiServer.ts`
- **Method**: `saveState()` (line ~1700)
- **Method**: `persistSnapshotToSupabase()` (line ~1750)
### Load State
- **File**: `bytelyst-trading-bot-service/src/services/apiServer.ts`
- **Method**: `loadState()` (line ~1800)
### Trading Control State
- **File**: `bytelyst-trading-bot-service/src/services/healthTracker.ts`
- **Property**: `tradingControl: TradingControlSnapshot`
- **Method**: `recordTradingControl()`
- **Method**: `isPaused()`
### Pause/Resume Endpoints
- **File**: `bytelyst-trading-bot-service/src/services/apiServer.ts`
- **Endpoint**: `POST /internal/trading/pause` (line 1054)
- **Endpoint**: `POST /internal/trading/resume` (line 1073)