DevZone404

YouTube Transcripts Downloader

Advanced tool for downloading YouTube video transcripts with metadata support and n8n integration

πŸŽ₯ YouTube Transcripts Downloader

Advanced tool for downloading YouTube video transcripts with metadata support and n8n integration.

Project inspired by youtube-transcript-api

✨ Features

FileText

Rich Metadata

Video titles, channel names, view counts, descriptions

Tag

Smart Naming

Files automatically named using video titles

FileJson

Multiple Formats

Markdown, JSON, SRT, VTT, plain text support

Workflow

n8n Integration

Full API server for workflow automation

Languages

Multi-language

Support for multiple transcript languages

Lock

Base64 Encoding

Automatic base64 encoding for secure transfer

πŸ“‹ Prerequisites

  • Python 3.6+
  • pip package manager
  • (Optional) n8n for workflow automation

πŸš€ Quick Start

CLI Usage

# Install dependencies
pip install -r requirements.txt

# Download transcript (uses video title as filename)
python youtube_transcript_downloader.py ABC123xyz

# Specify language and format
python youtube_transcript_downloader.py ABC123xyz --languages pl --format json

API Server

# Start API server
./start_api.sh

# Or manually
python api_server.py

API will be available at http://localhost:5000

πŸ“ Project Structure

yt-transcripts/
β”œβ”€β”€ youtube_transcript_downloader.py  # Main CLI script
β”œβ”€β”€ api_server.py                     # API server for n8n
β”œβ”€β”€ start_api.sh                      # API startup script
β”œβ”€β”€ requirements.txt                  # Dependencies
β”œβ”€β”€ Transcripts/                      # Transcripts folder
β”œβ”€β”€ README.md                         # Documentation
β”œβ”€β”€ N8N_INTEGRATION.md               # n8n integration docs
β”œβ”€β”€ LICENSE                           # MIT License
└── .gitignore                       # Git ignore file

πŸ”§ CLI Tool

Basic Usage

Download Transcript

# Download by video ID
python youtube_transcript_downloader.py ABC123xyz

# Or use full URL

python youtube_transcript_downloader.py "https://www.youtube.com/watch?v=ABC123xyz"

Language Options

# Specify preferred languages
python youtube_transcript_downloader.py ABC123xyz --languages pl en de

# Translate transcript
python youtube_transcript_downloader.py ABC123xyz --translate de

Output Formats

Markdown (default):

python youtube_transcript_downloader.py ABC123xyz

Files saved as Transcripts/Video Title.md and Transcripts/Video Title.b64

JSON:

python youtube_transcript_downloader.py ABC123xyz --format json --output transcript.json

SRT subtitles:

python youtube_transcript_downloader.py ABC123xyz --format srt --output transcript.srt

VTT subtitles:

python youtube_transcript_downloader.py ABC123xyz --format vtt --output transcript.vtt

Plain text:

python youtube_transcript_downloader.py ABC123xyz --format text --output transcript.txt

Advanced Options

# List available transcripts
python youtube_transcript_downloader.py ABC123xyz --list

# Preserve HTML formatting
python youtube_transcript_downloader.py ABC123xyz --preserve-formatting

# Exclude auto-generated transcripts
python youtube_transcript_downloader.py ABC123xyz --exclude-generated

# Exclude manual transcripts
python youtube_transcript_downloader.py ABC123xyz --exclude-manually-created

# Disable base64 encoding
python youtube_transcript_downloader.py ABC123xyz --no-base64

🌐 n8n Integration

The project includes a full API server for seamless integration with n8n workflows.

API Endpoints

EndpointMethodDescription
/transcriptPOSTMain transcript download
/transcripts/listPOSTList available languages
/metadataPOSTGet video metadata only
/healthGETHealth check

Example API Request

curl -X POST http://localhost:5000/transcript \
  -H "Content-Type: application/json" \
  -d '{
    "video_id": "ABC123xyz",
    "languages": ["pl", "en"],
    "format": "md",
    "save_to_file": true,
    "include_metadata": true
  }'

Example Response

{
  "success": true,
  "video_id": "ABC123xyz",
  "format": "md",
  "metadata": {
    "title": "Video Title",
    "channel": "Channel Name",
    "views": 1000000,
    "publish_date": "2023-01-01",
    "description": "Video description",
    "url": "https://www.youtube.com/watch?v=ABC123xyz"
  },
  "saved_to": "Transcripts/Video Title.md",
  "base64_file": "Transcripts/Video Title.b64",
  "base64": "IyBWaWRlbyBUaXRsZQ0K...",
  "transcript": "# Video Title\n\n**KanaΕ‚:** Channel Name\n\n..."
}

n8n Workflow Setup

HTTP Request Node Configuration

Method: POST
URL: http://localhost:5000/transcript
Headers: Content-Type: application/json

Body (JSON):

{
  "video_id": "{{$json.video_id}}",
  "languages": ["pl", "en"],
  "format": "md",
  "save_to_file": true,
  "encode_base64": true
}

Sample n8n Workflows

Workflow 1: Automated Transcript Collection

Trigger (Schedule) β†’
Google Sheets (Get video IDs) β†’
HTTP Request (Our API) β†’
Set (Process data) β†’
Google Sheets (Save results)

Workflow 2: Content Analysis

Webhook (New video) β†’
HTTP Request (Get transcript) β†’
AI Service (Analyze) β†’
Slack (Send notification)

πŸ› οΈ Go CLI Tool

Alternative Go implementation with simplified commands.

Commands

# Save transcript to file
yt-transcripts save -i ABC123xyz -l en -o transcript.txt

# List available transcripts
yt-transcripts list -i ABC123xyz

# Fetch transcript
yt-transcripts fetch -i ABC123xyz -l en

Options

CommandOptionsDescription
save-i, --idVideo ID
-l, --languageLanguage code
-o, --outputOutput filename
list-i, --idVideo ID
fetch-i, --idVideo ID
-l, --languageLanguage code

Global Options

--help, -h      Display command help message
--version, -v   Show app version

πŸ“¦ Features in Detail

Rich Metadata Extraction

Every transcript includes comprehensive video information:

  • Video title and description
  • Channel name
  • View count
  • Publication date
  • Direct video URL

Smart File Naming

Files are automatically named using video titles for easy organization:

  • Transcripts/How to Build a Website.md
  • Transcripts/Python Tutorial for Beginners.json

Base64 Encoding

For each .md file, a .b64 file is created containing:

  • Base64-encoded transcript content
  • Useful for secure data transfer
  • Integration with external systems
  • Can be disabled with --no-base64 flag

Auto-organization

All transcripts are automatically saved to the Transcripts/ folder with proper naming and structure.

Error Handling

Robust error handling for:

  • Invalid video IDs
  • Unavailable transcripts
  • Age-restricted content
  • Network issues
  • Rate limiting

πŸš€ Installation

Python Setup

# Clone repository
git clone https://github.com/yourusername/yt-transcripts.git
cd yt-transcripts

# Install dependencies
pip install -r requirements.txt

# Make scripts executable
chmod +x start_api.sh
chmod +x youtube_transcript_downloader.py

Docker Setup (Optional)

# Build and run with Docker
docker build -t youtube-transcripts .
docker run -p 5000:5000 youtube-transcripts

πŸ§ͺ Testing

# Run functionality tests
python test_functionality.py

# Test CLI
python youtube_transcript_downloader.py --help

# Test API endpoints
curl http://localhost:5000/health

βš™οΈ Configuration

Environment Variables

VariableDefaultDescription
PORT5000API server port
DEBUGfalseEnable debug mode
OUTPUT_DIRTranscriptsOutput directory

πŸ“ Use Cases

Content Creation

  • Extract video content for blog posts
  • Create summaries and highlights
  • Generate SEO-optimized content

Research & Analysis

  • Analyze video content at scale
  • Extract quotes and references
  • Build searchable video databases

Accessibility

  • Create subtitles for videos
  • Translate content to multiple languages
  • Make video content text-searchable

Automation

  • Integrate with n8n workflows
  • Batch process video libraries
  • Automated content monitoring

⚠️ Limitations

  • Some videos may have blocked transcript access
  • Age-restricted videos may require authentication
  • YouTube may rate-limit frequent requests
  • Not all videos have transcripts available

πŸ“„ License

This project is open source and available under the MIT License.

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ“ž Support

For issues and questions:

  • Create an issue on GitHub
  • Check the N8N_INTEGRATION.md for n8n-specific help