bytelyst-devops-tools/supabase monitor/EXECUTION_GUIDE.md

380 lines
9.7 KiB
Markdown

# YouTube Processing Workflow - Complete Execution Guide
## 📋 Table of Contents
1. [Overview](#overview)
2. [Prerequisites](#prerequisites)
3. [Installation](#installation)
4. [Configuration](#configuration)
5. [Execution Methods](#execution-methods)
6. [Expected Output](#expected-output)
7. [Troubleshooting](#troubleshooting)
8. [Examples](#examples)
## 🎯 Overview
This is a multi-agent YouTube processing workflow that:
- **Transcribes** YouTube videos using OpenAI Whisper
- **Translates** transcripts to target languages using LLM APIs
- **Summarizes** content based on custom prompts
- **Saves** results to local files in JSON and TXT formats
## 🔧 Prerequisites
### System Requirements
- **Python 3.8+** installed
- **FFmpeg** installed and in system PATH
- **Internet connection** for API calls and video processing
### API Keys Required
- **Perplexity API Key** (primary LLM)
- **OpenAI API Key** (backup LLM)
## 📦 Installation
### 1. Install Python Dependencies
```bash
pip install -r requirements.txt
```
### 2. Install FFmpeg
- **Windows**: Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH
- **macOS**: `brew install ffmpeg`
- **Ubuntu**: `sudo apt update && sudo apt install ffmpeg`
### 3. Verify Installation
```bash
python -c "import crewai, openai, whisper, yt_dlp; print('All dependencies installed successfully!')"
```
## ⚙️ Configuration
### 1. Create Environment File
Copy the example environment file:
```bash
cp env.example .env
```
### 2. Add API Keys
Edit the `.env` file with your API keys:
```env
# Perplexity API Configuration
PERPLEXITY_API_KEY=your_perplexity_api_key_here
# Optional: OpenAI API Key (as backup LLM)
OPENAI_API_KEY=your_openai_api_key_here
```
### 3. Get API Keys
#### Perplexity API
1. Visit [Perplexity AI](https://perplexity.ai/)
2. Sign up and get your API key
3. Add it to your `.env` file
#### OpenAI API (Backup)
1. Visit [OpenAI Platform](https://platform.openai.com/)
2. Create an API key
3. Add it to your `.env` file
## 🚀 Execution Methods
### Method 1: Simple Example (Recommended for Testing)
```bash
python example.py
```
**What it does:**
- Uses a demo YouTube video (Rick Roll)
- Processes in English
- Creates a 5-bullet point summary
- Saves output to `./output/` directory
### Method 2: Command Line Workflow
```bash
python workflow.py "https://www.youtube.com/watch?v=VIDEO_ID" "English" "Summarize in 5 bullet points for students to revise quickly"
```
**Parameters:**
- `youtube_url`: Full YouTube video URL
- `target_language`: Language for translation (e.g., "English", "Spanish", "French")
- `summarization_prompt`: Custom prompt for summary generation
**Example:**
```bash
python workflow.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" "Spanish" "Create a 3-point executive summary"
```
### Method 3: REST API Server
```bash
python api.py
```
**Server starts on:** `http://localhost:5000`
**API Endpoints:**
- `POST /process` - Complete video processing
- `POST /transcribe` - Transcription only
- `POST /translate` - Translation only
- `POST /summarize` - Summarization only
- `GET /health` - Health check
### Method 4: Demo with Multiple Examples
```bash
python demo.py
```
### Method 5: Run Tests
```bash
python test.py
```
## 📊 Expected Output
### File Structure
```
output/
├── YYYYMMDD_HHMMSS_VIDEOID_LANGUAGE.json
└── YYYYMMDD_HHMMSS_VIDEOID_LANGUAGE.txt
```
### JSON Output Format
```json
{
"summary": "Complete summary based on your prompt...",
"metadata": {
"youtube_url": "https://www.youtube.com/watch?v=example",
"target_language": "English",
"original_transcript_length": 1848,
"translated_text_length": 1848,
"workflow_timestamp": "1759118170.3120418",
"example_run": true,
"source": "example.py"
},
"timestamp": "20250129_143022",
"type": "youtube_summary",
"workflow_version": "1.0"
}
```
### TXT Output Format
```
# YouTube Video Summary
**Video:** https://www.youtube.com/watch?v=example
**Language:** English
**Generated:** 20250129_143022
---
Summary based on prompt 'Summarize in 5 bullet points for students to revise quickly':
• Point 1: Key insight or main topic
• Point 2: Important detail or concept
• Point 3: Supporting information
• Point 4: Additional context
• Point 5: Conclusion or takeaway
---
**Metadata:**
{
"youtube_url": "https://www.youtube.com/watch?v=example",
"target_language": "English",
"original_transcript_length": 1848,
"translated_text_length": 1848,
"workflow_timestamp": "1759118170.3120418",
"example_run": true,
"source": "example.py"
}
```
### Console Output
```
YouTube Processing Workflow - Simple Example
=======================================================
Configuration looks good!
Processing: https://www.youtube.com/watch?v=dQw4w9WgXcQ
Target Language: English
Summary Prompt: Summarize in 5 bullet points for students to revise quickly
Running workflow...
Starting transcription...
Starting translation to English...
Starting summarization...
Starting local file publishing...
Workflow completed!
================================================================================
YOUTUBE PROCESSING WORKFLOW SUMMARY
================================================================================
YouTube URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ
Target Language: English
Summary Prompt: Summarize in 5 bullet points for students to revise quickly
Overall Success: True
STAGE DETAILS:
TRANSCRIPTION:
Success: True
Content Preview: Never gonna give you up, never gonna let you down...
TRANSLATION:
Success: True
Content Preview: Never gonna give you up, never gonna let you down...
SUMMARIZATION:
Success: True
Content Preview: • This is a famous song by Rick Astley
• The song is about commitment and loyalty in relationships...
PUBLISHING:
Success: True
Output Files:
- JSON: ./output/20250129_143022_dQw4w9WgXcQ_English.json
- TXT: ./output/20250129_143022_dQw4w9WgXcQ_English.txt
================================================================================
Example completed successfully!
```
## 🐛 Troubleshooting
### Common Issues
#### 1. FFmpeg Not Found
```
Error: ffmpeg not found
```
**Solution:** Install FFmpeg and ensure it's in your system PATH.
#### 2. API Key Errors
```
Error: PERPLEXITY_API_KEY not found
```
**Solution:**
- Check your `.env` file exists
- Verify API keys are correctly formatted
- Ensure no extra spaces or quotes around the keys
#### 3. YouTube Access Issues
```
Error extracting audio from YouTube video
```
**Solution:**
- Ensure the video is public and accessible
- Check if the video has age restrictions
- Verify the URL format is correct
#### 4. Whisper Model Download Issues
```
Error downloading Whisper model
```
**Solution:**
- Check internet connection
- Ensure sufficient disk space (~1GB per model)
- Try running again (models are cached after first download)
#### 5. Import Errors
```
ImportError: No module named 'crewai'
```
**Solution:**
```bash
pip install -r requirements.txt
```
### Debug Mode
Enable debug logging for detailed error information:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```
## 📝 Examples
### Example 1: Educational Summary
```bash
python workflow.py "https://www.youtube.com/watch?v=example" "English" "Summarize in 5 bullet points for students to revise quickly"
```
### Example 2: Business Summary
```bash
python workflow.py "https://www.youtube.com/watch?v=example" "English" "Create a 3-point executive summary highlighting key business insights"
```
### Example 3: Creative Summary
```bash
python workflow.py "https://www.youtube.com/watch?v=example" "English" "Rewrite as an engaging story with dialogue and vivid descriptions"
```
### Example 4: Multi-language Processing
```bash
# Spanish
python workflow.py "https://www.youtube.com/watch?v=example" "Spanish" "Resumir en 5 puntos clave"
# French
python workflow.py "https://www.youtube.com/watch?v=example" "French" "Résumer en 5 points principaux"
# German
python workflow.py "https://www.youtube.com/watch?v=example" "German" "In 5 Hauptpunkten zusammenfassen"
```
### Example 5: API Usage
```bash
# Start server
python api.py
# Process video via API
curl -X POST http://localhost:5000/process \
-H "Content-Type: application/json" \
-d '{
"youtube_url": "https://www.youtube.com/watch?v=example",
"target_language": "English",
"summarization_prompt": "Summarize in 5 bullet points",
"metadata": {
"user_id": "student_123",
"course": "Data Science 101"
}
}'
```
## 📈 Performance Tips
1. **Model Selection**: Use smaller Whisper models for faster processing
2. **Batch Processing**: Process multiple videos using the API
3. **Caching**: Models are cached after first download
4. **Async Processing**: Use async/await patterns for large-scale deployments
## 🔍 Supported Languages
The system supports 100+ languages including:
- **European**: English, Spanish, French, German, Italian, Portuguese, Dutch, Swedish, Norwegian, Danish, Finnish
- **Asian**: Chinese, Japanese, Korean, Hindi, Thai, Vietnamese
- **Middle Eastern**: Arabic, Hebrew, Turkish
- **Others**: Russian, Polish, Czech, Hungarian, Romanian
## 📞 Support
For issues and questions:
1. Check the troubleshooting section above
2. Verify all prerequisites are met
3. Check API key configuration
4. Review console output for specific error messages
## 🎉 Success Indicators
Your workflow is working correctly when you see:
- ✅ "Configuration looks good!" message
- ✅ All stages show "Success: True"
- ✅ Output files created in `./output/` directory
- ✅ "Example completed successfully!" message
- ✅ Full summaries (not truncated text)
---
**Last Updated:** January 29, 2025
**Version:** 1.0
**Author:** YouTube Processing Workflow Team