# Multi-Agent YouTube Processing Workflow A comprehensive multi-agent workflow built with CrewAI that processes YouTube videos through transcription, translation, summarization, and local output. ## ๐ŸŽฏ Overview This project demonstrates a complete end-to-end workflow using CrewAI agents to: 1. **Transcribe** YouTube videos using OpenAI Whisper 2. **Translate** transcripts to target languages using LLM APIs 3. **Summarize** translated content based on custom prompts 4. **Save** final summaries to local files ## ๐Ÿ—๏ธ Architecture ### Agents 1. **Transcriber Agent** - Extracts audio from YouTube videos and generates transcripts 2. **Translator Agent** - Translates transcripts between languages 3. **Summarizer Agent** - Creates summaries based on custom prompts 4. **Publisher Agent** - Saves final content to local files ### Workflow Flow ```mermaid graph TD A[YouTube URL] --> B[Transcriber Agent] B --> C[Transcript] C --> D[Translator Agent] D --> E[Translated Text] E --> F[Summarizer Agent] F --> G[Summary] G --> H[Publisher Agent] H --> I[Local Files] ``` ## ๐Ÿš€ Quick Start ### Prerequisites - Python 3.8+ - FFmpeg installed on your system - Valid API keys (see Configuration section) ### Installation 1. **Clone the repository** ```bash git clone cd multi-agent-workflow ``` 2. **Install dependencies** ```bash pip install -r requirements.txt ``` 3. **Install FFmpeg** (required for audio processing) - **Windows**: Download from [ffmpeg.org](https://ffmpeg.org/download.html) - **macOS**: `brew install ffmpeg` - **Ubuntu**: `sudo apt update && sudo apt install ffmpeg` 4. **Configure environment variables** ```bash cp env.example .env # Edit .env with your API keys ``` ### Configuration Create a `.env` file with the following variables: ```env # Perplexity API Configuration PERPLEXITY_API_KEY=your_perplexity_api_key_here # Local output will be saved to ./output/ directory # Optional: OpenAI API Key (as backup LLM) OPENAI_API_KEY=your_openai_api_key_here ``` ### API Keys Setup #### Perplexity API 1. Visit [Perplexity AI](https://perplexity.ai/) 2. Sign up and get your API key 3. Add it to your `.env` file as `PERPLEXITY_API_KEY` #### OpenAI API (Backup) 1. Visit [OpenAI Platform](https://platform.openai.com/) 2. Create an API key 3. Add it to your `.env` file as `OPENAI_API_KEY` #### Local Output Output files will be automatically saved to the `./output/` directory in JSON and TXT formats. ## ๐Ÿ“– Usage Examples ### Command Line Interface Process a complete YouTube video: ```bash python workflow.py \ "https://www.youtube.com/watch?v=example" \ "Spanish" \ "Summarize in 5 bullet points for students to revise quickly" ``` ### Python Script Usage ```python from workflow import YouTubeProcessingWorkflow # Initialize workflow workflow = YouTubeProcessingWorkflow() # Process video results = workflow.process_youtube_video( youtube_url="https://www.youtube.com/watch?v=example", target_language="Spanish", summarization_prompt="Summarize in 5 bullet points for students to revise quickly" ) # Print results workflow.print_workflow_summary(results) ``` ### REST API Usage #### Start the API Server ```bash python api.py ``` The server will start on `http://localhost:5000` #### Process Video via API ```bash curl -X POST http://localhost:5000/process \ -H "Content-Type: application/json" \ -d '{ "youtube_url": "https://www.youtube.com/watch?v=example", "target_language": "Spanish", "summarization_prompt": "Summarize in 5 bullet points for students to revise quickly", "metadata": { "user_id": "student_123", "course": "Data Science 101" } }' ``` #### Individual Operations **Transcribe only:** ```bash curl -X POST http://localhost:5000/transcribe \ -H "Content-Type: application/json" \ -d '{"youtube_url": "https://www.youtube.com/watch?v=example"}' ``` **Translate text:** ```bash curl -X POST http://localhost:5000/translate \ -H "Content-Type: application/json" \ -d '{ "text": "Your text here", "target_language": "Spanish" }' ``` **Summarize text:** ```bash curl -X POST http://localhost:5000/summarize \ -H "Content-Type: application/json" \ -d '{ "text": "Your text here", "summarization_prompt": "Summarize in 5 bullet points" }' ``` ## ๐Ÿ“ Project Structure ``` multi-agent-workflow/ โ”œโ”€โ”€ agents/ # Agent implementations โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ”œโ”€โ”€ transcirer_agent.py # YouTube transcription โ”‚ โ”œโ”€โ”€ translator_agent.py # Language translation โ”‚ โ”œโ”€โ”€ summarizer_agent.py # Content summarization โ”‚ โ””โ”€โ”€ publisher_agent.py # API publishing โ”œโ”€โ”€ utils/ # Utility modules โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ””โ”€โ”€ speech_processing.py # Audio processing utilities โ”œโ”€โ”€ config.py # Configuration management โ”œโ”€โ”€ workflow.py # Main workflow orchestration โ”œโ”€โ”€ api.py # REST API interface โ”œโ”€โ”€ requirements.txt # Python dependencies โ”œโ”€โ”€ env.example # Environment variables template โ””โ”€โ”€ README.md # This file ``` ## ๐Ÿ”ง Customization ### Adding Custom Prompts You can customize summarization prompts for different use cases: ```python # Educational summary educational_prompt = "Summarize in 5 bullet points for students to revise quickly" # Business summary business_prompt = "Create a 3-point executive summary highlighting key business insights" # Creative summary creative_prompt = "Rewrite as an engaging story with dialogue and vivid descriptions" ``` ### Modifying Agent Behavior Each agent can be customized in its respective file: - **Transcriber**: Modify `YouTubeTranscriber` class in `utils/speech_processing.py` - **Translator**: Update translation logic in `agents/translator_agent.py` - **Summarizer**: Customize summarization prompts in `agents/summarizer_agent.py` - **Publisher**: Modify API integration in `agents/publisher_agent.py` ### Adding New Languages The translation system supports 100+ languages. Simply specify the language name in your target language: ```python supported_languages = [ "Spanish", "French", "German", "Italian", "Portuguese", "Chinese", "Japanese", "Korean", "Arabic", "Russian", "Dutch", "Swedish", "Norwegian", "Danish", "Finnish" ] ``` ## ๐Ÿ› Troubleshooting ### Common Issues #### FFmpeg Not Found ``` Error: ffmpeg not found ``` **Solution**: Install FFmpeg and ensure it's in your system PATH. #### Whisper Model Download Issues ``` Error downloading Whisper model ``` **Solution**: Check internet connection and ensure sufficient disk space (~1GB per model). #### API Key Errors ``` Error: PERPLEXITY_API_KEY not found ``` **Solution**: Verify your `.env` file contains valid API keys. #### YouTube Access Issues ``` Error extracting audio from YouTube video ``` **Solution**: - Ensure the video is public and accessible - Check if the video has age restrictions - Verify the URL format is correct ### Debug Mode Enable debug logging for detailed error information: ```python import logging logging.basicConfig(level=logging.DEBUG) ``` ## ๐Ÿ“Š Performance Tips 1. **Model Selection**: Use smaller Whisper models (`tiny`, `base`) for faster processing 2. **Batch Processing**: Process multiple videos using the API for better throughput 3. **Caching**: Implement caching for repeated transcriptions of the same video 4. **Async Processing**: Use async/await patterns for large-scale deployments ## ๐Ÿงช Testing Run the test suite: ```bash # Test individual components python -m pytest tests/ # Test complete workflow python test_workflow.py ``` ## ๐Ÿ“„ API Reference ### Main Workflow Class #### `YouTubeProcessingWorkflow` **Methods:** - `process_youtube_video(youtube_url, target_language, summarization_prompt, metadata=None)` - `print_workflow_summary(results)` ### REST API Endpoints #### `POST /process` Complete video processing workflow #### `POST /transcribe` YouTube video transcription only #### `POST /translate` Text translation to target language #### `POST /summarize` Text summarization based on prompt #### `GET /health` Health check endpoint ## ๐Ÿค Contributing 1. Fork the repository 2. Create a feature branch: `git checkout -b feature-name` 3. Commit changes: `git commit -am 'Add feature'` 4. Push to branch: `git push origin feature-name` 5. Submit a Pull Request ## ๐Ÿ“œ License This project is licensed under the MIT License - see the LICENSE file for details. ## ๐Ÿ™ Acknowledgments - [CrewAI](https://crewai.com/) for the agent orchestration framework - [OpenAI Whisper](https://openai-research.github.io/whisper/) for speech recognition - [yt-dlp](https://github.com/yt-dlp/yt-dlp) for YouTube video downloading - [Flask](https://flask.palletsprojects.com/) for the REST API framework ## ๐Ÿ“ž Support For support and questions: - Create an issue on GitHub - Contact the development team - Check the troubleshooting section above