4.8 KiB
🎬 YouTube Transcript to Numbered Files
Fixed scripts to download YouTube video transcripts and save each caption segment to separate numbered files (cc1.txt, cc2.txt, cc3.txt, etc.).
✅ What Was Fixed
The original script wrote all captions to a single captions.txt file. Now it:
- Creates separate files for each caption segment
- Numbers files sequentially:
cc1.txt,cc2.txt,cc3.txt, etc. - Organizes output in a dedicated directory
- Handles errors gracefully
- Shows progress during processing
📁 Available Scripts
1. transcribe_yt_video.py (Fixed Original)
The minimal fixed version of your original script.
# Just change the video ID and run
video_id = "dQw4w9WgXcQ" # Replace with your video ID
2. enhanced_yt_transcript.py (Recommended)
Full-featured script with command-line interface and error handling.
🚀 Usage
Quick Start (Fixed Original Script)
# Edit the video_id in the script, then run:
python transcribe_yt_video.py
Advanced Usage (Enhanced Script)
# Using video ID
python enhanced_yt_transcript.py dQw4w9WgXcQ
# Using full YouTube URL
python enhanced_yt_transcript.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
# Custom output directory
python enhanced_yt_transcript.py dQw4w9WgXcQ --output my_captions
# Specify preferred languages
python enhanced_yt_transcript.py dQw4w9WgXcQ --languages en es fr
# Get help
python enhanced_yt_transcript.py --help
📊 Output Structure
After running, you'll get:
captions/
├── cc1.txt # First caption segment
├── cc2.txt # Second caption segment
├── cc3.txt # Third caption segment
├── ...
├── cc150.txt # Last segment (example)
└── summary.txt # Summary information
Each cc#.txt file contains just the text from that caption segment.
🔧 Features
Fixed Original Script
- ✅ Separate files for each caption segment
- ✅ Sequential numbering (cc1.txt, cc2.txt, etc.)
- ✅ UTF-8 encoding for international characters
- ✅ Progress feedback showing what's being written
Enhanced Script
- ✅ Command-line interface - no need to edit code
- ✅ URL parsing - accepts YouTube URLs or video IDs
- ✅ Language selection - prefer specific languages
- ✅ Error handling - graceful failures with helpful messages
- ✅ Progress tracking - shows processing status
- ✅ Summary file - metadata about the download
- ✅ Directory cleanup - removes old files before new download
📋 Requirements
Install the required package:
pip install youtube-transcript-api
💡 Usage Examples
Example 1: Educational Video
python enhanced_yt_transcript.py "https://www.youtube.com/watch?v=VIDEO_ID" --output lecture_notes
Example 2: Multi-language Content
python enhanced_yt_transcript.py VIDEO_ID --languages en es --output multilang_captions
Example 3: Quick Processing
python enhanced_yt_transcript.py VIDEO_ID
# Creates captions/cc1.txt, captions/cc2.txt, etc.
🔍 Output Preview
When the script runs, you'll see:
🎬 Processing video ID: dQw4w9WgXcQ
✅ Found auto-generated or default transcript
📁 Created directory: captions
📝 Writing 156 segments...
📄 cc10.txt: Never gonna give you up, never gonna let you...
📄 cc20.txt: We've known each other for so long...
📄 cc30.txt: Your heart's been aching but you're too shy...
🎉 Success!
📊 Total segments: 156
📁 Files saved in: /full/path/to/captions/
📋 Summary saved to: captions/summary.txt
🛠️ Troubleshooting
Common Issues:
-
"No transcript found"
- Video might not have captions/transcripts
- Try a different video with confirmed captions
-
"Transcripts are disabled"
- Video owner disabled transcripts
- Try a different video
-
Module not found
pip install youtube-transcript-api
Testing:
Use the test script to verify everything works:
python test_transcript.py
(Remember to replace TEST_VIDEO_ID with a real video ID)
📝 File Contents Example
cc1.txt:
Welcome to this tutorial
cc2.txt:
Today we'll be learning about
cc3.txt:
the basics of programming
summary.txt:
YouTube Video ID: dQw4w9WgXcQ
Total segments: 156
Files: cc1.txt to cc156.txt
Generated: enhanced_yt_transcript.py
🎯 Perfect For:
- Content analysis - process each caption separately
- AI training data - individual text segments
- Research projects - granular transcript analysis
- Content creation - extract specific quotes/segments
- Translation work - process segments individually
The script is now fixed to write each caption segment to separate cc#.txt files as requested! 🎉