Advanced tool for downloading YouTube video transcripts with metadata support and n8n integration

🎥 YouTube Transcripts Downloader

Advanced tool for downloading YouTube video transcripts with metadata support and n8n integration.

Project inspired by youtube-transcript-api

✨ Features

FileText

Rich Metadata

Video titles, channel names, view counts, descriptions

Tag

Smart Naming

Files automatically named using video titles

FileJson

Multiple Formats

Markdown, JSON, SRT, VTT, plain text support

Workflow

n8n Integration

Full API server for workflow automation

Languages

Multi-language

Support for multiple transcript languages

Lock

Base64 Encoding

Automatic base64 encoding for secure transfer

📋 Prerequisites

Python 3.6+
pip package manager
(Optional) n8n for workflow automation

🚀 Quick Start

CLI Usage

# Install dependencies
pip install -r requirements.txt

# Download transcript (uses video title as filename)
python youtube_transcript_downloader.py ABC123xyz

# Specify language and format
python youtube_transcript_downloader.py ABC123xyz --languages pl --format json

API Server

# Start API server
./start_api.sh

# Or manually
python api_server.py

API will be available at http://localhost:5000

📁 Project Structure

yt-transcripts/
├── youtube_transcript_downloader.py  # Main CLI script
├── api_server.py                     # API server for n8n
├── start_api.sh                      # API startup script
├── requirements.txt                  # Dependencies
├── Transcripts/                      # Transcripts folder
├── README.md                         # Documentation
├── N8N_INTEGRATION.md               # n8n integration docs
├── LICENSE                           # MIT License
└── .gitignore                       # Git ignore file

🔧 CLI Tool

Basic Usage

Download Transcript

# Download by video ID
python youtube_transcript_downloader.py ABC123xyz

# Or use full URL

python youtube_transcript_downloader.py "https://www.youtube.com/watch?v=ABC123xyz"

Language Options

# Specify preferred languages
python youtube_transcript_downloader.py ABC123xyz --languages pl en de

# Translate transcript
python youtube_transcript_downloader.py ABC123xyz --translate de

Output Formats

Markdown (default):

python youtube_transcript_downloader.py ABC123xyz

Files saved as Transcripts/Video Title.md and Transcripts/Video Title.b64

JSON:

python youtube_transcript_downloader.py ABC123xyz --format json --output transcript.json

SRT subtitles:

python youtube_transcript_downloader.py ABC123xyz --format srt --output transcript.srt

VTT subtitles:

python youtube_transcript_downloader.py ABC123xyz --format vtt --output transcript.vtt

Plain text:

python youtube_transcript_downloader.py ABC123xyz --format text --output transcript.txt

Advanced Options

# List available transcripts
python youtube_transcript_downloader.py ABC123xyz --list

# Preserve HTML formatting
python youtube_transcript_downloader.py ABC123xyz --preserve-formatting

# Exclude auto-generated transcripts
python youtube_transcript_downloader.py ABC123xyz --exclude-generated

# Exclude manual transcripts
python youtube_transcript_downloader.py ABC123xyz --exclude-manually-created

# Disable base64 encoding
python youtube_transcript_downloader.py ABC123xyz --no-base64

🌐 n8n Integration

The project includes a full API server for seamless integration with n8n workflows.

API Endpoints

Endpoint	Method	Description
`/transcript`	POST	Main transcript download
`/transcripts/list`	POST	List available languages
`/metadata`	POST	Get video metadata only
`/health`	GET	Health check

Example API Request

curl -X POST http://localhost:5000/transcript \
  -H "Content-Type: application/json" \
  -d '{
    "video_id": "ABC123xyz",
    "languages": ["pl", "en"],
    "format": "md",
    "save_to_file": true,
    "include_metadata": true
  }'

Example Response

{
  "success": true,
  "video_id": "ABC123xyz",
  "format": "md",
  "metadata": {
    "title": "Video Title",
    "channel": "Channel Name",
    "views": 1000000,
    "publish_date": "2023-01-01",
    "description": "Video description",
    "url": "https://www.youtube.com/watch?v=ABC123xyz"
  },
  "saved_to": "Transcripts/Video Title.md",
  "base64_file": "Transcripts/Video Title.b64",
  "base64": "IyBWaWRlbyBUaXRsZQ0K...",
  "transcript": "# Video Title\n\n**Kanał:** Channel Name\n\n..."
}

n8n Workflow Setup

HTTP Request Node Configuration

Method: POST
URL: http://localhost:5000/transcript
Headers: Content-Type: application/json

Body (JSON):

{
  "video_id": "{{$json.video_id}}",
  "languages": ["pl", "en"],
  "format": "md",
  "save_to_file": true,
  "encode_base64": true
}

Sample n8n Workflows

Workflow 1: Automated Transcript Collection

Trigger (Schedule) →
Google Sheets (Get video IDs) →
HTTP Request (Our API) →
Set (Process data) →
Google Sheets (Save results)

Workflow 2: Content Analysis

Webhook (New video) →
HTTP Request (Get transcript) →
AI Service (Analyze) →
Slack (Send notification)

🛠️ Go CLI Tool

Alternative Go implementation with simplified commands.

Commands

# Save transcript to file
yt-transcripts save -i ABC123xyz -l en -o transcript.txt

# List available transcripts
yt-transcripts list -i ABC123xyz

# Fetch transcript
yt-transcripts fetch -i ABC123xyz -l en

Options

Command	Options	Description
`save`	`-i, --id`	Video ID
	`-l, --language`	Language code
	`-o, --output`	Output filename
`list`	`-i, --id`	Video ID
`fetch`	`-i, --id`	Video ID
	`-l, --language`	Language code

Global Options

--help, -h      Display command help message
--version, -v   Show app version

📦 Features in Detail

Rich Metadata Extraction

Every transcript includes comprehensive video information:

Video title and description
Channel name
View count
Publication date
Direct video URL

Smart File Naming

Files are automatically named using video titles for easy organization:

Transcripts/How to Build a Website.md
Transcripts/Python Tutorial for Beginners.json

Base64 Encoding

For each .md file, a .b64 file is created containing:

Base64-encoded transcript content
Useful for secure data transfer
Integration with external systems
Can be disabled with --no-base64 flag

Auto-organization

All transcripts are automatically saved to the Transcripts/ folder with proper naming and structure.

Error Handling

Robust error handling for:

Invalid video IDs
Unavailable transcripts
Age-restricted content
Network issues
Rate limiting

🚀 Installation

Python Setup

# Clone repository
git clone https://github.com/yourusername/yt-transcripts.git
cd yt-transcripts

# Install dependencies
pip install -r requirements.txt

# Make scripts executable
chmod +x start_api.sh
chmod +x youtube_transcript_downloader.py

Docker Setup (Optional)

# Build and run with Docker
docker build -t youtube-transcripts .
docker run -p 5000:5000 youtube-transcripts

🧪 Testing

# Run functionality tests
python test_functionality.py

# Test CLI
python youtube_transcript_downloader.py --help

# Test API endpoints
curl http://localhost:5000/health