354 lines
9.0 KiB
Markdown
354 lines
9.0 KiB
Markdown
# Multi-Agent YouTube Processing Workflow
|
|
|
|
A comprehensive multi-agent workflow built with CrewAI that processes YouTube videos through transcription, translation, summarization, and local output.
|
|
|
|
## 🎯 Overview
|
|
|
|
This project demonstrates a complete end-to-end workflow using CrewAI agents to:
|
|
1. **Transcribe** YouTube videos using OpenAI Whisper
|
|
2. **Translate** transcripts to target languages using LLM APIs
|
|
3. **Summarize** translated content based on custom prompts
|
|
4. **Save** final summaries to local files
|
|
|
|
## 🏗️ Architecture
|
|
|
|
### Agents
|
|
|
|
1. **Transcriber Agent** - Extracts audio from YouTube videos and generates transcripts
|
|
2. **Translator Agent** - Translates transcripts between languages
|
|
3. **Summarizer Agent** - Creates summaries based on custom prompts
|
|
4. **Publisher Agent** - Saves final content to local files
|
|
|
|
### Workflow Flow
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[YouTube URL] --> B[Transcriber Agent]
|
|
B --> C[Transcript]
|
|
C --> D[Translator Agent]
|
|
D --> E[Translated Text]
|
|
E --> F[Summarizer Agent]
|
|
F --> G[Summary]
|
|
G --> H[Publisher Agent]
|
|
H --> I[Local Files]
|
|
```
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### Prerequisites
|
|
|
|
- Python 3.8+
|
|
- FFmpeg installed on your system
|
|
- Valid API keys (see Configuration section)
|
|
|
|
### Installation
|
|
|
|
1. **Clone the repository**
|
|
```bash
|
|
git clone <repository-url>
|
|
cd multi-agent-workflow
|
|
```
|
|
|
|
2. **Install dependencies**
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
3. **Install FFmpeg** (required for audio processing)
|
|
- **Windows**: Download from [ffmpeg.org](https://ffmpeg.org/download.html)
|
|
- **macOS**: `brew install ffmpeg`
|
|
- **Ubuntu**: `sudo apt update && sudo apt install ffmpeg`
|
|
|
|
4. **Configure environment variables**
|
|
```bash
|
|
cp env.example .env
|
|
# Edit .env with your API keys
|
|
```
|
|
|
|
### Configuration
|
|
|
|
Create a `.env` file with the following variables:
|
|
|
|
```env
|
|
# Perplexity API Configuration
|
|
PERPLEXITY_API_KEY=your_perplexity_api_key_here
|
|
|
|
# Local output will be saved to ./output/ directory
|
|
|
|
# Optional: OpenAI API Key (as backup LLM)
|
|
OPENAI_API_KEY=your_openai_api_key_here
|
|
```
|
|
|
|
### API Keys Setup
|
|
|
|
#### Perplexity API
|
|
1. Visit [Perplexity AI](https://perplexity.ai/)
|
|
2. Sign up and get your API key
|
|
3. Add it to your `.env` file as `PERPLEXITY_API_KEY`
|
|
|
|
#### OpenAI API (Backup)
|
|
1. Visit [OpenAI Platform](https://platform.openai.com/)
|
|
2. Create an API key
|
|
3. Add it to your `.env` file as `OPENAI_API_KEY`
|
|
|
|
#### Local Output
|
|
Output files will be automatically saved to the `./output/` directory in JSON and TXT formats.
|
|
|
|
## 📖 Usage Examples
|
|
|
|
### Command Line Interface
|
|
|
|
Process a complete YouTube video:
|
|
|
|
```bash
|
|
python workflow.py \
|
|
"https://www.youtube.com/watch?v=example" \
|
|
"Spanish" \
|
|
"Summarize in 5 bullet points for students to revise quickly"
|
|
```
|
|
|
|
### Python Script Usage
|
|
|
|
```python
|
|
from workflow import YouTubeProcessingWorkflow
|
|
|
|
# Initialize workflow
|
|
workflow = YouTubeProcessingWorkflow()
|
|
|
|
# Process video
|
|
results = workflow.process_youtube_video(
|
|
youtube_url="https://www.youtube.com/watch?v=example",
|
|
target_language="Spanish",
|
|
summarization_prompt="Summarize in 5 bullet points for students to revise quickly"
|
|
)
|
|
|
|
# Print results
|
|
workflow.print_workflow_summary(results)
|
|
```
|
|
|
|
### REST API Usage
|
|
|
|
#### Start the API Server
|
|
|
|
```bash
|
|
python api.py
|
|
```
|
|
|
|
The server will start on `http://localhost:5000`
|
|
|
|
#### Process Video via API
|
|
|
|
```bash
|
|
curl -X POST http://localhost:5000/process \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"youtube_url": "https://www.youtube.com/watch?v=example",
|
|
"target_language": "Spanish",
|
|
"summarization_prompt": "Summarize in 5 bullet points for students to revise quickly",
|
|
"metadata": {
|
|
"user_id": "student_123",
|
|
"course": "Data Science 101"
|
|
}
|
|
}'
|
|
```
|
|
|
|
#### Individual Operations
|
|
|
|
**Transcribe only:**
|
|
```bash
|
|
curl -X POST http://localhost:5000/transcribe \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"youtube_url": "https://www.youtube.com/watch?v=example"}'
|
|
```
|
|
|
|
**Translate text:**
|
|
```bash
|
|
curl -X POST http://localhost:5000/translate \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"text": "Your text here",
|
|
"target_language": "Spanish"
|
|
}'
|
|
```
|
|
|
|
**Summarize text:**
|
|
```bash
|
|
curl -X POST http://localhost:5000/summarize \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"text": "Your text here",
|
|
"summarization_prompt": "Summarize in 5 bullet points"
|
|
}'
|
|
```
|
|
|
|
## 📁 Project Structure
|
|
|
|
```
|
|
multi-agent-workflow/
|
|
├── agents/ # Agent implementations
|
|
│ ├── __init__.py
|
|
│ ├── transcirer_agent.py # YouTube transcription
|
|
│ ├── translator_agent.py # Language translation
|
|
│ ├── summarizer_agent.py # Content summarization
|
|
│ └── publisher_agent.py # API publishing
|
|
├── utils/ # Utility modules
|
|
│ ├── __init__.py
|
|
│ └── speech_processing.py # Audio processing utilities
|
|
├── config.py # Configuration management
|
|
├── workflow.py # Main workflow orchestration
|
|
├── api.py # REST API interface
|
|
├── requirements.txt # Python dependencies
|
|
├── env.example # Environment variables template
|
|
└── README.md # This file
|
|
```
|
|
|
|
## 🔧 Customization
|
|
|
|
### Adding Custom Prompts
|
|
|
|
You can customize summarization prompts for different use cases:
|
|
|
|
```python
|
|
# Educational summary
|
|
educational_prompt = "Summarize in 5 bullet points for students to revise quickly"
|
|
|
|
# Business summary
|
|
business_prompt = "Create a 3-point executive summary highlighting key business insights"
|
|
|
|
# Creative summary
|
|
creative_prompt = "Rewrite as an engaging story with dialogue and vivid descriptions"
|
|
```
|
|
|
|
### Modifying Agent Behavior
|
|
|
|
Each agent can be customized in its respective file:
|
|
|
|
- **Transcriber**: Modify `YouTubeTranscriber` class in `utils/speech_processing.py`
|
|
- **Translator**: Update translation logic in `agents/translator_agent.py`
|
|
- **Summarizer**: Customize summarization prompts in `agents/summarizer_agent.py`
|
|
- **Publisher**: Modify API integration in `agents/publisher_agent.py`
|
|
|
|
### Adding New Languages
|
|
|
|
The translation system supports 100+ languages. Simply specify the language name in your target language:
|
|
|
|
```python
|
|
supported_languages = [
|
|
"Spanish", "French", "German", "Italian", "Portuguese",
|
|
"Chinese", "Japanese", "Korean", "Arabic", "Russian",
|
|
"Dutch", "Swedish", "Norwegian", "Danish", "Finnish"
|
|
]
|
|
```
|
|
|
|
## 🐛 Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
#### FFmpeg Not Found
|
|
```
|
|
Error: ffmpeg not found
|
|
```
|
|
**Solution**: Install FFmpeg and ensure it's in your system PATH.
|
|
|
|
#### Whisper Model Download Issues
|
|
```
|
|
Error downloading Whisper model
|
|
```
|
|
**Solution**: Check internet connection and ensure sufficient disk space (~1GB per model).
|
|
|
|
#### API Key Errors
|
|
```
|
|
Error: PERPLEXITY_API_KEY not found
|
|
```
|
|
**Solution**: Verify your `.env` file contains valid API keys.
|
|
|
|
#### YouTube Access Issues
|
|
```
|
|
Error extracting audio from YouTube video
|
|
```
|
|
**Solution**:
|
|
- Ensure the video is public and accessible
|
|
- Check if the video has age restrictions
|
|
- Verify the URL format is correct
|
|
|
|
### Debug Mode
|
|
|
|
Enable debug logging for detailed error information:
|
|
|
|
```python
|
|
import logging
|
|
logging.basicConfig(level=logging.DEBUG)
|
|
```
|
|
|
|
## 📊 Performance Tips
|
|
|
|
1. **Model Selection**: Use smaller Whisper models (`tiny`, `base`) for faster processing
|
|
2. **Batch Processing**: Process multiple videos using the API for better throughput
|
|
3. **Caching**: Implement caching for repeated transcriptions of the same video
|
|
4. **Async Processing**: Use async/await patterns for large-scale deployments
|
|
|
|
## 🧪 Testing
|
|
|
|
Run the test suite:
|
|
|
|
```bash
|
|
# Test individual components
|
|
python -m pytest tests/
|
|
|
|
# Test complete workflow
|
|
python test_workflow.py
|
|
```
|
|
|
|
## 📄 API Reference
|
|
|
|
### Main Workflow Class
|
|
|
|
#### `YouTubeProcessingWorkflow`
|
|
|
|
**Methods:**
|
|
- `process_youtube_video(youtube_url, target_language, summarization_prompt, metadata=None)`
|
|
- `print_workflow_summary(results)`
|
|
|
|
### REST API Endpoints
|
|
|
|
#### `POST /process`
|
|
Complete video processing workflow
|
|
|
|
#### `POST /transcribe`
|
|
YouTube video transcription only
|
|
|
|
#### `POST /translate`
|
|
Text translation to target language
|
|
|
|
#### `POST /summarize`
|
|
Text summarization based on prompt
|
|
|
|
#### `GET /health`
|
|
Health check endpoint
|
|
|
|
## 🤝 Contributing
|
|
|
|
1. Fork the repository
|
|
2. Create a feature branch: `git checkout -b feature-name`
|
|
3. Commit changes: `git commit -am 'Add feature'`
|
|
4. Push to branch: `git push origin feature-name`
|
|
5. Submit a Pull Request
|
|
|
|
## 📜 License
|
|
|
|
This project is licensed under the MIT License - see the LICENSE file for details.
|
|
|
|
## 🙏 Acknowledgments
|
|
|
|
- [CrewAI](https://crewai.com/) for the agent orchestration framework
|
|
- [OpenAI Whisper](https://openai-research.github.io/whisper/) for speech recognition
|
|
- [yt-dlp](https://github.com/yt-dlp/yt-dlp) for YouTube video downloading
|
|
- [Flask](https://flask.palletsprojects.com/) for the REST API framework
|
|
|
|
## 📞 Support
|
|
|
|
For support and questions:
|
|
- Create an issue on GitHub
|
|
- Contact the development team
|
|
- Check the troubleshooting section above
|