| .. | ||
| agents | ||
| utils | ||
| api.py | ||
| config.py | ||
| demo.py | ||
| env.example | ||
| example.py | ||
| EXECUTION_GUIDE.md | ||
| README.md | ||
| requirements.txt | ||
| setup.py | ||
| test.py | ||
| workflow.py | ||
Multi-Agent YouTube Processing Workflow
A comprehensive multi-agent workflow built with CrewAI that processes YouTube videos through transcription, translation, summarization, and local output.
🎯 Overview
This project demonstrates a complete end-to-end workflow using CrewAI agents to:
- Transcribe YouTube videos using OpenAI Whisper
- Translate transcripts to target languages using LLM APIs
- Summarize translated content based on custom prompts
- Save final summaries to local files
🏗️ Architecture
Agents
- Transcriber Agent - Extracts audio from YouTube videos and generates transcripts
- Translator Agent - Translates transcripts between languages
- Summarizer Agent - Creates summaries based on custom prompts
- Publisher Agent - Saves final content to local files
Workflow Flow
graph TD
A[YouTube URL] --> B[Transcriber Agent]
B --> C[Transcript]
C --> D[Translator Agent]
D --> E[Translated Text]
E --> F[Summarizer Agent]
F --> G[Summary]
G --> H[Publisher Agent]
H --> I[Local Files]
🚀 Quick Start
Prerequisites
- Python 3.8+
- FFmpeg installed on your system
- Valid API keys (see Configuration section)
Installation
-
Clone the repository
git clone <repository-url> cd multi-agent-workflow -
Install dependencies
pip install -r requirements.txt -
Install FFmpeg (required for audio processing)
- Windows: Download from ffmpeg.org
- macOS:
brew install ffmpeg - Ubuntu:
sudo apt update && sudo apt install ffmpeg
-
Configure environment variables
cp env.example .env # Edit .env with your API keys
Configuration
Create a .env file with the following variables:
# Perplexity API Configuration
PERPLEXITY_API_KEY=your_perplexity_api_key_here
# Local output will be saved to ./output/ directory
# Optional: OpenAI API Key (as backup LLM)
OPENAI_API_KEY=your_openai_api_key_here
API Keys Setup
Perplexity API
- Visit Perplexity AI
- Sign up and get your API key
- Add it to your
.envfile asPERPLEXITY_API_KEY
OpenAI API (Backup)
- Visit OpenAI Platform
- Create an API key
- Add it to your
.envfile asOPENAI_API_KEY
Local Output
Output files will be automatically saved to the ./output/ directory in JSON and TXT formats.
📖 Usage Examples
Command Line Interface
Process a complete YouTube video:
python workflow.py \
"https://www.youtube.com/watch?v=example" \
"Spanish" \
"Summarize in 5 bullet points for students to revise quickly"
Python Script Usage
from workflow import YouTubeProcessingWorkflow
# Initialize workflow
workflow = YouTubeProcessingWorkflow()
# Process video
results = workflow.process_youtube_video(
youtube_url="https://www.youtube.com/watch?v=example",
target_language="Spanish",
summarization_prompt="Summarize in 5 bullet points for students to revise quickly"
)
# Print results
workflow.print_workflow_summary(results)
REST API Usage
Start the API Server
python api.py
The server will start on http://localhost:5000
Process Video via API
curl -X POST http://localhost:5000/process \
-H "Content-Type: application/json" \
-d '{
"youtube_url": "https://www.youtube.com/watch?v=example",
"target_language": "Spanish",
"summarization_prompt": "Summarize in 5 bullet points for students to revise quickly",
"metadata": {
"user_id": "student_123",
"course": "Data Science 101"
}
}'
Individual Operations
Transcribe only:
curl -X POST http://localhost:5000/transcribe \
-H "Content-Type: application/json" \
-d '{"youtube_url": "https://www.youtube.com/watch?v=example"}'
Translate text:
curl -X POST http://localhost:5000/translate \
-H "Content-Type: application/json" \
-d '{
"text": "Your text here",
"target_language": "Spanish"
}'
Summarize text:
curl -X POST http://localhost:5000/summarize \
-H "Content-Type: application/json" \
-d '{
"text": "Your text here",
"summarization_prompt": "Summarize in 5 bullet points"
}'
📁 Project Structure
multi-agent-workflow/
├── agents/ # Agent implementations
│ ├── __init__.py
│ ├── transcirer_agent.py # YouTube transcription
│ ├── translator_agent.py # Language translation
│ ├── summarizer_agent.py # Content summarization
│ └── publisher_agent.py # API publishing
├── utils/ # Utility modules
│ ├── __init__.py
│ └── speech_processing.py # Audio processing utilities
├── config.py # Configuration management
├── workflow.py # Main workflow orchestration
├── api.py # REST API interface
├── requirements.txt # Python dependencies
├── env.example # Environment variables template
└── README.md # This file
🔧 Customization
Adding Custom Prompts
You can customize summarization prompts for different use cases:
# Educational summary
educational_prompt = "Summarize in 5 bullet points for students to revise quickly"
# Business summary
business_prompt = "Create a 3-point executive summary highlighting key business insights"
# Creative summary
creative_prompt = "Rewrite as an engaging story with dialogue and vivid descriptions"
Modifying Agent Behavior
Each agent can be customized in its respective file:
- Transcriber: Modify
YouTubeTranscriberclass inutils/speech_processing.py - Translator: Update translation logic in
agents/translator_agent.py - Summarizer: Customize summarization prompts in
agents/summarizer_agent.py - Publisher: Modify API integration in
agents/publisher_agent.py
Adding New Languages
The translation system supports 100+ languages. Simply specify the language name in your target language:
supported_languages = [
"Spanish", "French", "German", "Italian", "Portuguese",
"Chinese", "Japanese", "Korean", "Arabic", "Russian",
"Dutch", "Swedish", "Norwegian", "Danish", "Finnish"
]
🐛 Troubleshooting
Common Issues
FFmpeg Not Found
Error: ffmpeg not found
Solution: Install FFmpeg and ensure it's in your system PATH.
Whisper Model Download Issues
Error downloading Whisper model
Solution: Check internet connection and ensure sufficient disk space (~1GB per model).
API Key Errors
Error: PERPLEXITY_API_KEY not found
Solution: Verify your .env file contains valid API keys.
YouTube Access Issues
Error extracting audio from YouTube video
Solution:
- Ensure the video is public and accessible
- Check if the video has age restrictions
- Verify the URL format is correct
Debug Mode
Enable debug logging for detailed error information:
import logging
logging.basicConfig(level=logging.DEBUG)
📊 Performance Tips
- Model Selection: Use smaller Whisper models (
tiny,base) for faster processing - Batch Processing: Process multiple videos using the API for better throughput
- Caching: Implement caching for repeated transcriptions of the same video
- Async Processing: Use async/await patterns for large-scale deployments
🧪 Testing
Run the test suite:
# Test individual components
python -m pytest tests/
# Test complete workflow
python test_workflow.py
📄 API Reference
Main Workflow Class
YouTubeProcessingWorkflow
Methods:
process_youtube_video(youtube_url, target_language, summarization_prompt, metadata=None)print_workflow_summary(results)
REST API Endpoints
POST /process
Complete video processing workflow
POST /transcribe
YouTube video transcription only
POST /translate
Text translation to target language
POST /summarize
Text summarization based on prompt
GET /health
Health check endpoint
🤝 Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Commit changes:
git commit -am 'Add feature' - Push to branch:
git push origin feature-name - Submit a Pull Request
📜 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- CrewAI for the agent orchestration framework
- OpenAI Whisper for speech recognition
- yt-dlp for YouTube video downloading
- Flask for the REST API framework
📞 Support
For support and questions:
- Create an issue on GitHub
- Contact the development team
- Check the troubleshooting section above