saravanakumardb1 fb3bc750eb fix: update .env.example comments, Grafana dashboard, and debug-service.md for consolidated services

2026-02-14 22:01:55 -08:00

4.0 KiB

Raw Blame History

Debug Service Skill

Description: Systematic methodology for diagnosing and fixing failing services or endpoints across the entire stack.

When to Use

Any service is returning errors or unexpected behavior
Health checks are failing
Tests are failing in CI or locally
Users report broken functionality

Prerequisites

Access to service logs
Terminal with curl installed
Ability to run tests locally

Steps

1. Identify the Failing Service

Quickly locate which service is affected:

# Service locations reference:
# Backend API (Python/FastAPI) → backend/src/
# Billing Service (Fastify) → ../learning_ai_common_plat/services/billing-service/src/
# Growth Service (Fastify) → ../learning_ai_common_plat/services/growth-service/src/
# Platform Service (Fastify) → ../learning_ai_common_plat/services/platform-service/src/
# Tracker Service (Fastify) → ../learning_ai_common_plat/services/tracker-service/src/
# Admin Dashboard (Next.js) → admin-dashboard-web/src/
# User Dashboard (Next.js) → user-dashboard-web/src/
# Tracker Dashboard (Next.js) → tracker-dashboard-web/src/

2. Check Health Status

# Check all services at once
curl -s http://localhost:8000/health && \
curl -s http://localhost:4003/health

3. Examine Logs

For local development:

# Check recent logs from all services
tail -50 .logs/backend.log .logs/platform-service.log 2>/dev/null | head -100

For Docker:

docker compose logs --tail=50 <service-name>

4. Reproduce the Issue

API errors: Use curl with verbose output

curl -v -X POST http://localhost:8000/api/endpoint -H "Content-Type: application/json" -d '{"key":"value"}'

UI errors: Check browser console and network tab

Test failures: Run specific tests with verbose output

# Python
python -m pytest tests/test_specific.py -v -x

# TypeScript/Vitest
pnpm test --reporter=verbose specific.test.ts

5. Fix Methodology

Follow this order to avoid common pitfalls:

Read the test first - Understand what the expected behavior is
Read the source code - Trace the execution path
Fix the source, NOT the test - Unless the test itself is wrong
Add a regression test - If none exists for this bug
Run the full test suite - Ensure no new issues were introduced

6. Verify the Fix

# Python tests
python -m pytest tests/ backend/tests/ -v --tb=short -x

# TypeScript services
cd ../learning_ai_common_plat && pnpm --filter @lysnrai/<service-name> test

# Dashboard builds
cd admin-dashboard-web && npm run build

7. Commit with Proper Format

git add .
git commit -m "fix(<scope>): <description>"
# Examples:
# fix(billing): handle null subscription in usage endpoint
# fix(platform): validate JWT token in auth middleware
# fix(admin): resolve dashboard loading state issue
git push

Common Patterns

Database Connection Issues

# Check Cosmos DB connectivity
curl -s http://localhost:4003/health | jq .
# Look for database errors in logs
grep -i "database\|cosmos\|connection" .logs/*.log

Authentication Issues

# Decode JWT to check contents
echo "<jwt-token>" | cut -d. -f2 | base64 -d | jq .
# Check auth service health
curl -s http://localhost:4003/api/auth/me -H "Authorization: Bearer <token>"

Service Dependencies

# Check if dependent services are running
docker compose ps
# Verify service communication
curl -s http://localhost:4003/health | jq .  # consolidated platform service

Notes

Always check logs first - Most issues have clear error messages
Isolate the problem - Don't change multiple things at once
Document the fix - Add comments if the issue was non-obvious
Consider edge cases - Think about what might cause this to fail again

Local Development Setup - Start services locally
Production Readiness - Comprehensive validation
Test Strategies - Writing effective tests

4.0 KiB Raw Blame History