152 lines
4.0 KiB
Markdown
152 lines
4.0 KiB
Markdown
# Debug Service Skill
|
|
|
|
**Description**: Systematic methodology for diagnosing and fixing failing services or endpoints across the entire stack.
|
|
|
|
## When to Use
|
|
|
|
- Any service is returning errors or unexpected behavior
|
|
- Health checks are failing
|
|
- Tests are failing in CI or locally
|
|
- Users report broken functionality
|
|
|
|
## Prerequisites
|
|
|
|
- Access to service logs
|
|
- Terminal with curl installed
|
|
- Ability to run tests locally
|
|
|
|
## Steps
|
|
|
|
### 1. Identify the Failing Service
|
|
|
|
Quickly locate which service is affected:
|
|
|
|
```bash
|
|
# Service locations reference:
|
|
# Backend API (Python/FastAPI) → backend/src/
|
|
# Billing Service (Fastify) → ../learning_ai_common_plat/services/billing-service/src/
|
|
# Growth Service (Fastify) → ../learning_ai_common_plat/services/growth-service/src/
|
|
# Platform Service (Fastify) → ../learning_ai_common_plat/services/platform-service/src/
|
|
# Tracker Service (Fastify) → ../learning_ai_common_plat/services/tracker-service/src/
|
|
# Admin Dashboard (Next.js) → admin-dashboard-web/src/
|
|
# User Dashboard (Next.js) → user-dashboard-web/src/
|
|
# Tracker Dashboard (Next.js) → tracker-dashboard-web/src/
|
|
```
|
|
|
|
### 2. Check Health Status
|
|
|
|
```bash
|
|
# Check all services at once
|
|
curl -s http://localhost:8000/health && \
|
|
curl -s http://localhost:4003/health
|
|
```
|
|
|
|
### 3. Examine Logs
|
|
|
|
**For local development:**
|
|
|
|
```bash
|
|
# Check recent logs from all services
|
|
tail -50 .logs/backend.log .logs/platform-service.log 2>/dev/null | head -100
|
|
```
|
|
|
|
**For Docker:**
|
|
|
|
```bash
|
|
docker compose logs --tail=50 <service-name>
|
|
```
|
|
|
|
### 4. Reproduce the Issue
|
|
|
|
- **API errors**: Use curl with verbose output
|
|
```bash
|
|
curl -v -X POST http://localhost:8000/api/endpoint -H "Content-Type: application/json" -d '{"key":"value"}'
|
|
```
|
|
- **UI errors**: Check browser console and network tab
|
|
- **Test failures**: Run specific tests with verbose output
|
|
|
|
```bash
|
|
# Python
|
|
python -m pytest tests/test_specific.py -v -x
|
|
|
|
# TypeScript/Vitest
|
|
pnpm test --reporter=verbose specific.test.ts
|
|
```
|
|
|
|
### 5. Fix Methodology
|
|
|
|
Follow this order to avoid common pitfalls:
|
|
|
|
1. **Read the test first** - Understand what the expected behavior is
|
|
2. **Read the source code** - Trace the execution path
|
|
3. **Fix the source, NOT the test** - Unless the test itself is wrong
|
|
4. **Add a regression test** - If none exists for this bug
|
|
5. **Run the full test suite** - Ensure no new issues were introduced
|
|
|
|
### 6. Verify the Fix
|
|
|
|
```bash
|
|
# Python tests
|
|
python -m pytest tests/ backend/tests/ -v --tb=short -x
|
|
|
|
# TypeScript services
|
|
cd ../learning_ai_common_plat && pnpm --filter @lysnrai/<service-name> test
|
|
|
|
# Dashboard builds
|
|
cd admin-dashboard-web && npm run build
|
|
```
|
|
|
|
### 7. Commit with Proper Format
|
|
|
|
```bash
|
|
git add .
|
|
git commit -m "fix(<scope>): <description>"
|
|
# Examples:
|
|
# fix(billing): handle null subscription in usage endpoint
|
|
# fix(platform): validate JWT token in auth middleware
|
|
# fix(admin): resolve dashboard loading state issue
|
|
git push
|
|
```
|
|
|
|
## Common Patterns
|
|
|
|
### Database Connection Issues
|
|
|
|
```bash
|
|
# Check Cosmos DB connectivity
|
|
curl -s http://localhost:4003/health | jq .
|
|
# Look for database errors in logs
|
|
grep -i "database\|cosmos\|connection" .logs/*.log
|
|
```
|
|
|
|
### Authentication Issues
|
|
|
|
```bash
|
|
# Decode JWT to check contents
|
|
echo "<jwt-token>" | cut -d. -f2 | base64 -d | jq .
|
|
# Check auth service health
|
|
curl -s http://localhost:4003/api/auth/me -H "Authorization: Bearer <token>"
|
|
```
|
|
|
|
### Service Dependencies
|
|
|
|
```bash
|
|
# Check if dependent services are running
|
|
docker compose ps
|
|
# Verify service communication
|
|
curl -s http://localhost:4003/health | jq . # consolidated platform service
|
|
```
|
|
|
|
## Notes
|
|
|
|
- **Always check logs first** - Most issues have clear error messages
|
|
- **Isolate the problem** - Don't change multiple things at once
|
|
- **Document the fix** - Add comments if the issue was non-obvious
|
|
- **Consider edge cases** - Think about what might cause this to fail again
|
|
|
|
## Related Skills
|
|
|
|
- [Local Development Setup](./local-development.md) - Start services locally
|
|
- [Production Readiness](./production-readiness.md) - Comprehensive validation
|
|
- [Test Strategies](./test-strategies.md) - Writing effective tests
|