# Admin Observability & Health Panel This document describes the runtime observability system implemented for the trading bot administrators. ## Overview The Admin Error & Health panel provides real-time visibility into the bot's internal state and actionable issues. It is designed for operators to quickly identify why trading might be paused, failing, or behaving unexpectedly without having to dig through raw logs. ## Architecture ### Backend: `ObservabilityService` - **In-Memory Buffer**: Stores the last 50 operational events in a ring buffer. - **Structured Events**: Every event follows the `OperationalEvent` interface. - **Filtering**: Events are filtered by user role. Only administrators receive operational events via the API and Socket.IO. ### Frontend: `AdminTab` (System Health) - **Status Badge**: A global indicator of system health (Healthy, Degraded, Critical). - **Event List**: A chronologically ordered list of recent operational events with severity levels (INFO, WARN, ERROR). - **Telemetry**: Real-time display of execution loop durations, exchange latency, and lock contention counts. ## Operational Event Types | Type | Severity | Description | |------|----------|-------------| | `INSUFFICIENT_BUYING_POWER` | WARN | Attempted to open a position but broker reported insufficient capital. | | `ORDER_FAILURE` | ERROR | Exchange rejected an order (e.g., price out of bounds, invalid qty). | | `EXCHANGE_STATE_MISMATCH` | WARN | Discrepancy detected between internal database and exchange state. | | `RECONCILIATION_DEGRADED` | ERROR | Reconciliation loop is failing repeatedly. | | `SYSTEM_ERROR` | WARN/ERROR | General system issues, including exchange API timeouts or manual pauses. | ## Security & Performance - **Sensitive Data**: Events contain structured messages instead of raw stack traces or internal environment variables. - **Cap**: Both backend buffer and frontend display are capped at 50 events to ensure performance and prevent memory bloating. - **RBAC**: Operational events are only pushed to authenticated sockets belonging to users with the `admin` role. ## Usage 1. Navigate to the **Admin** tab. 2. Select **System Health**. 3. Review the **Operational Events** list for recent issues. 4. If a global red banner appears at the top of the dashboard, it indicates a critical operational event occured in the last 10 minutes.