# 🎬 YouTube Transcript to Numbered Files Fixed scripts to download YouTube video transcripts and save each caption segment to separate numbered files (`cc1.txt`, `cc2.txt`, `cc3.txt`, etc.). ## ✅ What Was Fixed The original script wrote all captions to a single `captions.txt` file. Now it: - **Creates separate files** for each caption segment - **Numbers files sequentially**: `cc1.txt`, `cc2.txt`, `cc3.txt`, etc. - **Organizes output** in a dedicated directory - **Handles errors** gracefully - **Shows progress** during processing ## 📁 Available Scripts ### 1. `transcribe_yt_video.py` (Fixed Original) The minimal fixed version of your original script. ```python # Just change the video ID and run video_id = "dQw4w9WgXcQ" # Replace with your video ID ``` ### 2. `enhanced_yt_transcript.py` (Recommended) Full-featured script with command-line interface and error handling. ## 🚀 Usage ### Quick Start (Fixed Original Script) ```bash # Edit the video_id in the script, then run: python transcribe_yt_video.py ``` ### Advanced Usage (Enhanced Script) ```bash # Using video ID python enhanced_yt_transcript.py dQw4w9WgXcQ # Using full YouTube URL python enhanced_yt_transcript.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" # Custom output directory python enhanced_yt_transcript.py dQw4w9WgXcQ --output my_captions # Specify preferred languages python enhanced_yt_transcript.py dQw4w9WgXcQ --languages en es fr # Get help python enhanced_yt_transcript.py --help ``` ## 📊 Output Structure After running, you'll get: ``` captions/ ├── cc1.txt # First caption segment ├── cc2.txt # Second caption segment ├── cc3.txt # Third caption segment ├── ... ├── cc150.txt # Last segment (example) └── summary.txt # Summary information ``` Each `cc#.txt` file contains just the text from that caption segment. ## 🔧 Features ### Fixed Original Script - ✅ **Separate files** for each caption segment - ✅ **Sequential numbering** (cc1.txt, cc2.txt, etc.) - ✅ **UTF-8 encoding** for international characters - ✅ **Progress feedback** showing what's being written ### Enhanced Script - ✅ **Command-line interface** - no need to edit code - ✅ **URL parsing** - accepts YouTube URLs or video IDs - ✅ **Language selection** - prefer specific languages - ✅ **Error handling** - graceful failures with helpful messages - ✅ **Progress tracking** - shows processing status - ✅ **Summary file** - metadata about the download - ✅ **Directory cleanup** - removes old files before new download ## 📋 Requirements Install the required package: ```bash pip install youtube-transcript-api ``` ## 💡 Usage Examples ### Example 1: Educational Video ```bash python enhanced_yt_transcript.py "https://www.youtube.com/watch?v=VIDEO_ID" --output lecture_notes ``` ### Example 2: Multi-language Content ```bash python enhanced_yt_transcript.py VIDEO_ID --languages en es --output multilang_captions ``` ### Example 3: Quick Processing ```bash python enhanced_yt_transcript.py VIDEO_ID # Creates captions/cc1.txt, captions/cc2.txt, etc. ``` ## 🔍 Output Preview When the script runs, you'll see: ``` 🎬 Processing video ID: dQw4w9WgXcQ ✅ Found auto-generated or default transcript 📁 Created directory: captions 📝 Writing 156 segments... 📄 cc10.txt: Never gonna give you up, never gonna let you... 📄 cc20.txt: We've known each other for so long... 📄 cc30.txt: Your heart's been aching but you're too shy... 🎉 Success! 📊 Total segments: 156 📁 Files saved in: /full/path/to/captions/ 📋 Summary saved to: captions/summary.txt ``` ## 🛠️ Troubleshooting ### Common Issues: 1. **"No transcript found"** - Video might not have captions/transcripts - Try a different video with confirmed captions 2. **"Transcripts are disabled"** - Video owner disabled transcripts - Try a different video 3. **Module not found** ```bash pip install youtube-transcript-api ``` ### Testing: Use the test script to verify everything works: ```bash python test_transcript.py ``` (Remember to replace `TEST_VIDEO_ID` with a real video ID) ## 📝 File Contents Example **cc1.txt:** ``` Welcome to this tutorial ``` **cc2.txt:** ``` Today we'll be learning about ``` **cc3.txt:** ``` the basics of programming ``` **summary.txt:** ``` YouTube Video ID: dQw4w9WgXcQ Total segments: 156 Files: cc1.txt to cc156.txt Generated: enhanced_yt_transcript.py ``` ## 🎯 Perfect For: - **Content analysis** - process each caption separately - **AI training data** - individual text segments - **Research projects** - granular transcript analysis - **Content creation** - extract specific quotes/segments - **Translation work** - process segments individually --- **The script is now fixed to write each caption segment to separate `cc#.txt` files as requested!** 🎉