YouTube Transcripts Downloader
Advanced tool for downloading YouTube video transcripts with metadata support and n8n integration
π₯ YouTube Transcripts Downloader
Advanced tool for downloading YouTube video transcripts with metadata support and n8n integration.
Project inspired by youtube-transcript-api
β¨ Features
Rich Metadata
Video titles, channel names, view counts, descriptions
Smart Naming
Files automatically named using video titles
Multiple Formats
Markdown, JSON, SRT, VTT, plain text support
n8n Integration
Full API server for workflow automation
Multi-language
Support for multiple transcript languages
Base64 Encoding
Automatic base64 encoding for secure transfer
π Prerequisites
- Python 3.6+
- pip package manager
- (Optional) n8n for workflow automation
π Quick Start
CLI Usage
# Install dependencies
pip install -r requirements.txt
# Download transcript (uses video title as filename)
python youtube_transcript_downloader.py ABC123xyz
# Specify language and format
python youtube_transcript_downloader.py ABC123xyz --languages pl --format jsonAPI Server
# Start API server
./start_api.sh
# Or manually
python api_server.pyAPI will be available at http://localhost:5000
π Project Structure
yt-transcripts/
βββ youtube_transcript_downloader.py # Main CLI script
βββ api_server.py # API server for n8n
βββ start_api.sh # API startup script
βββ requirements.txt # Dependencies
βββ Transcripts/ # Transcripts folder
βββ README.md # Documentation
βββ N8N_INTEGRATION.md # n8n integration docs
βββ LICENSE # MIT License
βββ .gitignore # Git ignore fileπ§ CLI Tool
Basic Usage
Download Transcript
# Download by video ID
python youtube_transcript_downloader.py ABC123xyz
# Or use full URL
python youtube_transcript_downloader.py "https://www.youtube.com/watch?v=ABC123xyz"Language Options
# Specify preferred languages
python youtube_transcript_downloader.py ABC123xyz --languages pl en de
# Translate transcript
python youtube_transcript_downloader.py ABC123xyz --translate deOutput Formats
Markdown (default):
python youtube_transcript_downloader.py ABC123xyzFiles saved as Transcripts/Video Title.md and Transcripts/Video Title.b64
JSON:
python youtube_transcript_downloader.py ABC123xyz --format json --output transcript.jsonSRT subtitles:
python youtube_transcript_downloader.py ABC123xyz --format srt --output transcript.srtVTT subtitles:
python youtube_transcript_downloader.py ABC123xyz --format vtt --output transcript.vttPlain text:
python youtube_transcript_downloader.py ABC123xyz --format text --output transcript.txtAdvanced Options
# List available transcripts
python youtube_transcript_downloader.py ABC123xyz --list
# Preserve HTML formatting
python youtube_transcript_downloader.py ABC123xyz --preserve-formatting
# Exclude auto-generated transcripts
python youtube_transcript_downloader.py ABC123xyz --exclude-generated
# Exclude manual transcripts
python youtube_transcript_downloader.py ABC123xyz --exclude-manually-created
# Disable base64 encoding
python youtube_transcript_downloader.py ABC123xyz --no-base64π n8n Integration
The project includes a full API server for seamless integration with n8n workflows.
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/transcript | POST | Main transcript download |
/transcripts/list | POST | List available languages |
/metadata | POST | Get video metadata only |
/health | GET | Health check |
Example API Request
curl -X POST http://localhost:5000/transcript \
-H "Content-Type: application/json" \
-d '{
"video_id": "ABC123xyz",
"languages": ["pl", "en"],
"format": "md",
"save_to_file": true,
"include_metadata": true
}'Example Response
{
"success": true,
"video_id": "ABC123xyz",
"format": "md",
"metadata": {
"title": "Video Title",
"channel": "Channel Name",
"views": 1000000,
"publish_date": "2023-01-01",
"description": "Video description",
"url": "https://www.youtube.com/watch?v=ABC123xyz"
},
"saved_to": "Transcripts/Video Title.md",
"base64_file": "Transcripts/Video Title.b64",
"base64": "IyBWaWRlbyBUaXRsZQ0K...",
"transcript": "# Video Title\n\n**KanaΕ:** Channel Name\n\n..."
}n8n Workflow Setup
HTTP Request Node Configuration
Method: POST
URL: http://localhost:5000/transcript
Headers: Content-Type: application/json
Body (JSON):
{
"video_id": "{{$json.video_id}}",
"languages": ["pl", "en"],
"format": "md",
"save_to_file": true,
"encode_base64": true
}Sample n8n Workflows
Workflow 1: Automated Transcript Collection
Trigger (Schedule) β
Google Sheets (Get video IDs) β
HTTP Request (Our API) β
Set (Process data) β
Google Sheets (Save results)Workflow 2: Content Analysis
Webhook (New video) β
HTTP Request (Get transcript) β
AI Service (Analyze) β
Slack (Send notification)π οΈ Go CLI Tool
Alternative Go implementation with simplified commands.
Commands
# Save transcript to file
yt-transcripts save -i ABC123xyz -l en -o transcript.txt
# List available transcripts
yt-transcripts list -i ABC123xyz
# Fetch transcript
yt-transcripts fetch -i ABC123xyz -l enOptions
| Command | Options | Description |
|---|---|---|
save | -i, --id | Video ID |
-l, --language | Language code | |
-o, --output | Output filename | |
list | -i, --id | Video ID |
fetch | -i, --id | Video ID |
-l, --language | Language code |
Global Options
--help, -h Display command help message
--version, -v Show app versionπ¦ Features in Detail
Rich Metadata Extraction
Every transcript includes comprehensive video information:
- Video title and description
- Channel name
- View count
- Publication date
- Direct video URL
Smart File Naming
Files are automatically named using video titles for easy organization:
Transcripts/How to Build a Website.mdTranscripts/Python Tutorial for Beginners.json
Base64 Encoding
For each .md file, a .b64 file is created containing:
- Base64-encoded transcript content
- Useful for secure data transfer
- Integration with external systems
- Can be disabled with
--no-base64flag
Auto-organization
All transcripts are automatically saved to the Transcripts/ folder with proper naming and structure.
Error Handling
Robust error handling for:
- Invalid video IDs
- Unavailable transcripts
- Age-restricted content
- Network issues
- Rate limiting
π Installation
Python Setup
# Clone repository
git clone https://github.com/yourusername/yt-transcripts.git
cd yt-transcripts
# Install dependencies
pip install -r requirements.txt
# Make scripts executable
chmod +x start_api.sh
chmod +x youtube_transcript_downloader.pyDocker Setup (Optional)
# Build and run with Docker
docker build -t youtube-transcripts .
docker run -p 5000:5000 youtube-transcriptsπ§ͺ Testing
# Run functionality tests
python test_functionality.py
# Test CLI
python youtube_transcript_downloader.py --help
# Test API endpoints
curl http://localhost:5000/healthβοΈ Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
PORT | 5000 | API server port |
DEBUG | false | Enable debug mode |
OUTPUT_DIR | Transcripts | Output directory |
π Use Cases
Content Creation
- Extract video content for blog posts
- Create summaries and highlights
- Generate SEO-optimized content
Research & Analysis
- Analyze video content at scale
- Extract quotes and references
- Build searchable video databases
Accessibility
- Create subtitles for videos
- Translate content to multiple languages
- Make video content text-searchable
Automation
- Integrate with n8n workflows
- Batch process video libraries
- Automated content monitoring
β οΈ Limitations
- Some videos may have blocked transcript access
- Age-restricted videos may require authentication
- YouTube may rate-limit frequent requests
- Not all videos have transcripts available
π License
This project is open source and available under the MIT License.
π€ Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
π Support
For issues and questions:
- Create an issue on GitHub
- Check the N8N_INTEGRATION.md for n8n-specific help