learning_ai_invt_trdg/backend/prometheus-metrics.md

67 lines
2.9 KiB
Markdown

# Prometheus Metrics Guide
The Bytelyst Trading Bot exposes a `/metrics` Prometheus endpoint for advanced monitoring and Grafana integration.
## Scraping Configuration
The metrics are available at `http://<bot-ip>:5000/metrics`.
## Exported Metrics
### System Health & Events
- `bytelyst_bot_operational_events_total` (Counter)
- **Description**: Cumulative count of system events (orders, errors, warnings).
- **Labels**: `severity`, `type`, `profile_id`, `symbol`, `env`, `mode`.
- **Usage**: Monitor for spikes in `severity="ERROR"` or `type="ORDER_FAILURE"`.
### Subsystem Performance
- `bytelyst_bot_subsystem_duration_seconds` (Histogram)
- **Description**: Execution time for core loops (trading, monitor, reconciliation).
- **Labels**: `subsystem`, `env`, `mode`.
- **Buckets**: `[0.1, 0.25, 0.5, 1, 2, 5, 10]`.
- **Usage**: Identify slow execution cycles or database contention.
- `bytelyst_bot_subsystem_last_run_timestamp` (Gauge)
- **Description**: Unix timestamp of the last successful subsystem run.
- **Labels**: `subsystem`, `env`, `mode`.
- **Usage**: Verify how recently each process checked in.
- `bytelyst_bot_subsystem_alive` (Gauge)
- **Description**: Binary flag indicating if a subsystem is fresh (1) or stalled (0).
- **Labels**: `subsystem`, `env`, `mode`.
- **Usage**: Critical dashboard "Traffic Light" indicator.
### Exchange Connectivity
- `bytelyst_bot_exchange_api_latency_seconds` (Histogram)
- **Description**: Latency of external API calls to the exchange.
- **Labels**: `exchange`, `operation`, `env`, `mode`.
- **Usage**: Distinguish between internal bot lag and exchange infrastructure lag.
### Risk & Capital Invariants
- `bytelyst_bot_capital_invariant_violations_total` (Counter)
- **Description**: Count of times available capital fell below zero for a profile.
- **Labels**: `profile_id`, `env`, `mode`.
- **Usage**: Critical alert metric. Should always be 0.
- `bytelyst_bot_profile_utilization_percent` (Gauge)
- **Description**: Percentage of allocated capital currently in use (positions + open orders).
- **Labels**: `profile_id`, `env`, `mode`.
- **Usage**: Monitor capital efficiency and exposure.
### Data Integrity (Reconciliation)
- `bytelyst_bot_reconciliation_mismatches_total` (Counter)
- **Description**: Count of detected mismatches between local state and exchange state.
- **Labels**: `env`, `mode`.
- **Usage**: Track consistency of the "single source of truth."
- `bytelyst_bot_reconciliation_missing_items_count` (Gauge)
- **Description**: Number of missing orders/positions in the last sync cycle.
- **Labels**: `source` (db/exchange), `env`, `mode`.
- **Usage**: Identify synchronization drift.
## Default Labels
Every metric is automatically tagged with:
- `env`: `development` or `production` (from `NODE_ENV`).
- `mode`: `paper` or `live` (from `PAPER_TRADING`).
- `app`: `bytelyst-trading-bot-service`.