learning_ai_common_plat/.env.example
saravanakumardb1 321cfe7546 feat(fleet): correlation-id tracing + capacity autoscaling signal (ops #4/#5)
Thread a trace-context correlation id across the coordinator<->runner boundary
so a logical work-unit (job -> claim -> run -> ship) is stitchable end to end,
and add an advisory capacity autoscaling signal an external scaler can consume.

Tracing (#4):
- Mint/propagate a correlationId at submit from the inbound
  x-correlation-id/traceparent/x-request-id (else generate ftr_<uuid>); persist
  it on the job, inherit onto the run + lease at claim, and stamp every
  lifecycle event (submitted/assigned/transition/lease_renewed/lease_released/
  retry_scheduled/dead_letter). Children of a composite job share the parent id.
- Echo it back on the x-correlation-id response header (submit/claim/renew/
  release/patch) so a factory can carry it forward, and bind it to req.log.
- New pure trace.ts (header resolution incl. W3C traceparent trace-id).

Autoscaling signal (#5):
- New pure autoscaler.ts turns a product FleetMetrics + saturation alerts
  (no_live_capacity/saturated/queue_starvation) into an auditable scale
  recommendation (action/recommendedSeats/delta/urgency/signals).
  budget_exhausted suppresses scale-out; idle slack reclaims down to a floor.
  Thresholds tunable via FLEET_AUTOSCALE_* env.
- GET /fleet/autoscale (per-product) + GET /fleet/autoscale/all (global, admin
  or scrape token). Documented the env vars in .env.example.

Tests: +29 (trace 10, tracing 7, autoscaler 12); full suite 1846 green; lint + tsc clean.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 22:43:56 -07:00

121 lines
5.2 KiB
Plaintext

# ── Common Platform Environment Variables ──────────────────────
# Copy to .env and fill in real values.
# ── Azure Key Vault (optional — secrets fall back to env vars) ─
# Set this to resolve secrets from AKV instead of .env:
AZURE_KEYVAULT_URL=
# ── Cosmos DB (prototype defaults to local emulator) ───────────
# For the Docker prototype stack, leave these pointed at the local emulator.
# When you move to a managed environment later, replace them with real Azure values.
COSMOS_ENDPOINT=http://cosmos-emulator:8081
COSMOS_KEY=<cosmos-emulator-key>
COSMOS_DATABASE=lysnrai
# ── Auth (platform-service) ─────────────────────────
JWT_SECRET=change-me-prototype-jwt-secret
RATE_LIMIT_STORE_MODE=datastore
RATE_LIMIT_CONFIG_JSON=
API_KEY_RATE_LIMIT_CONFIG_JSON=
API_KEY_PRODUCT_RATE_LIMIT_CONFIG_JSON=
# ── Azure Blob Storage (platform-service) ─────────────────────
STORAGE_PROVIDER=azure
AZURE_BLOB_CONNECTION_STRING=DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=<azurite-default-key>;BlobEndpoint=http://azurite:10000/devstoreaccount1;
AZURE_BLOB_ACCOUNT_NAME=devstoreaccount1
AZURE_BLOB_ACCOUNT_KEY=<azurite-default-key>
AZURE_BLOB_PUBLIC_ENDPOINT=http://localhost:10000/devstoreaccount1
# ── Stripe (platform-service) ────────────────────────
STRIPE_SECRET_KEY=sk_test_...
STRIPE_WEBHOOK_SECRET=whsec_...
STRIPE_PRICE_PRO=price_...
STRIPE_PRICE_ENTERPRISE=price_...
# ── Email Delivery (platform-service) ─────────────────────────
# Use `smtp` for a self-hosted SMTP relay such as Mailpit, Postal, Mailcow, etc.
EMAIL_PROVIDER=smtp
EMAIL_FROM_ADDRESS=noreply@bytelyst.local
EMAIL_FROM_NAME=ByteLyst
SMTP_HOST=mailpit
SMTP_PORT=1025
SMTP_SECURE=false
SMTP_USER=
SMTP_PASSWORD=
TELEGRAM_BOT_TOKEN=
TELEGRAM_DEFAULT_CHAT_ID=
SLACK_WEBHOOK_URL=
SLACK_DEFAULT_CHANNEL=
EVENT_BUS_BACKEND=file
EVENT_BUS_FILE=.data/platform-events.json
EVENT_BUS_POLL_MS=100
EVENT_BUS_LEASE_MS=30000
# ── Extraction Service (port 4005 + Python sidecar 4006) ─────
PYTHON_SIDECAR_URL=http://localhost:4006
DEFAULT_MODEL_ID=gemini-2.5-flash
GEMINI_API_KEY=your-gemini-api-key
EXTRACTION_QUEUE_BACKEND=file
EXTRACTION_QUEUE_FILE=.data/extraction-jobs.json
EXTRACTION_QUEUE_POLL_MS=100
EXTRACTION_QUEUE_LEASE_MS=30000
# ── Webhooks (optional — fire-and-forget callbacks) ──────────
WEBHOOK_INVITATION_REDEEMED_URL=
WEBHOOK_REFERRAL_STATUS_URL=
WEBHOOK_WAITLIST_JOINED_URL=
# ── Telemetry (platform-service) ──────────────────────────────
TELEMETRY_ENABLED=true
TELEMETRY_ALERT_WEBHOOK_URL=
TELEMETRY_GEO_API_URL=http://ip-api.com/json
TELEMETRY_EVENT_TTL_DAYS=90
# ── Field Encryption (@bytelyst/field-encrypt) ──────────────
# Key provider: 'akv' (production) | 'env' (dev/staging) | 'memory' (tests)
FIELD_ENCRYPT_KEY_PROVIDER=memory
# Hex-encoded 32-byte key — only for 'env' provider (like AUTH_TOTP_ENCRYPTION_KEY)
FIELD_ENCRYPT_KEY=
# Product-specific MEK name in AKV — only for 'akv' provider
FIELD_ENCRYPT_MEK_NAME=lysnr-mek
# ── Gitea NPM Registry (private @bytelyst packages) ─────────
# Token for authenticating with the Gitea npm registry.
# Generate at: http://<GITEA_NPM_HOST>:3300/user/settings/applications
GITEA_NPM_TOKEN=
GITEA_NPM_HOST=localhost
GITEA_NPM_OWNER=learning_ai_user
# ── Product Identity ──────────────────────────────────────────
DEFAULT_PRODUCT_ID=lysnrai
# ── Cowork Service (port 4009 — Fastify bridge to Rust runtime) ─
# cowork-service forwards auth, flags, audit, telemetry, and AI budgets to
# platform-service. The Anthropic key is only needed when running the Rust
# runtime locally via IPC; in the containerised dev stack it is optional.
ANTHROPIC_API_KEY=
RUST_RUNTIME_BIN=cowork-orchestrator
RUST_RUNTIME_TIMEOUT_MS=300000
OLLAMA_URL=http://localhost:11434/v1
OLLAMA_MODELS=
FEATURE_FLAGS_ENABLED=true
# ── Fleet ops/observability ───────────────────────────────────
# Bearer token Prometheus uses to scrape GET /api/fleet/metrics/prom. Must match
# the `credentials` in services/monitoring/prometheus/prometheus.yml. When unset,
# the endpoint requires an admin JWT instead (so it is never world-readable).
FLEET_METRICS_TOKEN=changeme-fleet-metrics-token
# Fleet feature flags (default OFF): cost/latency routing, per-engine breaker,
# per-product/-engine budget enforcement, and multi-tenant access enforcement.
FLEET_COST_ROUTING=
FLEET_ENGINE_BREAKER=
FLEET_BUDGETS=
FLEET_TENANT_ENFORCEMENT=
# Capacity autoscaling signal (§5) — tunes the advisory scale recommendation
# served at GET /api/fleet/autoscale[/all] (consumed by an external scaler).
# All optional; unset keys fall back to the in-code defaults shown below.
FLEET_AUTOSCALE_SCALE_OUT_PCT=85
FLEET_AUTOSCALE_SCALE_IN_PCT=20
FLEET_AUTOSCALE_MAX_STEP=5
FLEET_AUTOSCALE_MIN_SEATS=0