Universal YouTube Subscription Data Extractor - Extract comprehensive channel information from YouTube subscription MHTML files with 100% data coverage including subscriber counts, descriptions, and profile images.
Perfect for content creators, researchers, marketers, and anyone who needs to analyze their YouTube subscription data or create comprehensive channel databases.
- π― 100% Data Coverage - Extracts all available channel information
- π Comprehensive Fields - Channel name, URL, profile image, subscriber count, and description
- π Smart Subscriber Parsing - Handles both abbreviated (29.7K) and raw numbers (29700)
- πΌοΈ Advanced Image Extraction - Recovers profile images from MHTML Content-Location headers
- π§Ή MHTML Processing - Properly handles complex MHTML encoding and structure
- β‘ Efficient Processing - Handles large subscription lists (500+ channels)
- π Multiple Export Formats - CSV, JSON, XML, and SQL output formats
- π‘οΈ Error Recovery - Graceful handling of malformed or incomplete data
- π§ Cross-Platform - Works on Windows, macOS, and Linux
- Clone the repository:
git clone https://github.com/abe238/youtube-subscription-extractor.git
cd youtube-subscription-extractor- Run the installation script:
macOS/Linux:
./scripts/install.shWindows:
scripts\install.bat- Test the installation:
python bin/extract.py --help# Extract subscription data from MHTML file
python bin/extract.py path/to/subscriptions.mhtml
# Custom output file
python bin/extract.py subscriptions.mhtml --output my_channels.csv
# Export to different formats
python bin/extract.py subscriptions.mhtml --output data.json
python bin/extract.py subscriptions.mhtml --output channels.xml
python bin/extract.py subscriptions.mhtml --output database.sql
python bin/extract.py subscriptions.mhtml --output subscriptions.opml
# Specify output directory
python bin/extract.py subscriptions.mhtml --output-dir ./exports/-
Open YouTube in your browser (Chrome, Firefox, Safari, Edge)
-
Go to your subscriptions page:
https://www.youtube.com/feed/channels -
Save the page as MHTML/Web Archive:
- Chrome: Ctrl/Cmd+S β Save as "Webpage, Complete" or "MHTML"
- Firefox: Ctrl/Cmd+S β Save as "Web Page, complete"
- Safari: File β Export As β Web Archive
- Edge: Ctrl/Cmd+S β Save as "Webpage, Complete"
-
Use the saved file with this extractor
- Developer Tools: Right-click β Save as β Webpage Complete
- Browser Extensions: Use MHTML export extensions
- Command Line: Use tools like
wgetorcurlwith proper cookies
The extractor supports multiple output formats, automatically detected from file extension or explicitly specified:
- CSV (
.csv) - Comma-separated values for spreadsheet applications - JSON (
.json) - Structured data with metadata for programmatic use - XML (
.xml) - Hierarchical markup format - SQL (
.sql) - Database insert statements with table creation - OPML (
.opml) - RSS feed list for RSS readers (Feedly, Reeder, etc.)
All formats include the following channel information:
| Column | Description | Example |
|---|---|---|
ChannelName |
Display name of the channel | "AI For Humans" |
ChannelID |
YouTube channel ID (UC...) | "UCPjNBjflYl0-HQtUvOx0Ibw" |
ChannelLink |
Full YouTube channel URL | "https://www.youtube.com/@AIForHumansShow" |
ChannelImage |
Profile image URL (176x176) | "https://yt3.googleusercontent.com/..." |
SubscriberCount |
Abbreviated subscriber count | "29.7K" |
SubsCountRaw |
Raw subscriber number | "29700" |
ChannelDescription |
Channel description text | "AI (Artificial Intelligence) made fun..." |
CSV Format:
ChannelName,ChannelID,ChannelLink,ChannelImage,SubscriberCount,SubsCountRaw,ChannelDescription
AI For Humans,UCPjNBjflYl0-HQtUvOx0Ibw,https://www.youtube.com/@AIForHumansShow,https://yt3.googleusercontent.com/...,29.7K,29700,"AI made fun..."
JSON Format:
{
"metadata": {
"export_date": "2024-09-08T12:00:00",
"extractor_version": "1.2.0",
"total_channels": 64,
"channels_with_subscribers": 64,
"channels_with_images": 8,
"channels_with_descriptions": 52
},
"channels": [
{
"ChannelName": "AI For Humans",
"ChannelID": "UCPjNBjflYl0-HQtUvOx0Ibw",
"ChannelLink": "https://www.youtube.com/@AIForHumansShow",
"ChannelImage": "https://yt3.googleusercontent.com/...",
"SubscriberCount": "29.7K",
"SubsCountRaw": "29700",
"ChannelDescription": "AI made fun..."
}
]
}XML Format:
<?xml version="1.0" ?>
<youtube_channels>
<metadata>
<export_date>2024-09-08T12:00:00</export_date>
<extractor_version>1.2.0</extractor_version>
<total_channels>64</total_channels>
</metadata>
<channels>
<channel>
<channelname>AI For Humans</channelname>
<channelid>UCPjNBjflYl0-HQtUvOx0Ibw</channelid>
<channellink>https://www.youtube.com/@AIForHumansShow</channellink>
<channelimage>https://yt3.googleusercontent.com/...</channelimage>
<subscribercount>29.7K</subscribercount>
<subscountraw>29700</subscountraw>
<channeldescription>AI made fun...</channeldescription>
</channel>
</channels>
</youtube_channels>SQL Format:
-- YouTube Channels Export
-- Generated on: 2024-09-08T12:00:00
CREATE TABLE IF NOT EXISTS youtube_channels (
id INTEGER PRIMARY KEY AUTOINCREMENT,
channel_name VARCHAR(255) NOT NULL,
channel_id VARCHAR(30),
channel_link VARCHAR(500) NOT NULL UNIQUE,
channel_image VARCHAR(500),
subscriber_count VARCHAR(20),
subscriber_count_raw INTEGER,
channel_description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO youtube_channels (channel_name, channel_id, channel_link, ...) VALUES
('AI For Humans', 'UCPjNBjflYl0-HQtUvOx0Ibw', 'https://www.youtube.com/@AIForHumansShow', ...);OPML Format:
<?xml version="1.0" ?>
<opml version="2.0">
<head>
<title>YouTube Subscriptions</title>
<dateCreated>Thu, 16 Oct 2025 02:31:29 GMT</dateCreated>
</head>
<body>
<outline type="rss" text="AI For Humans" title="AI For Humans"
xmlUrl="https://youtube.com/feeds/videos.xml?channel_id=UCPjNBjflYl0-HQtUvOx0Ibw"
htmlUrl="https://www.youtube.com/@AIForHumansShow"/>
<!-- More channels... -->
</body>
</opml>| Option | Description | Default |
|---|---|---|
input_file |
Path to YouTube subscriptions MHTML file | Required |
--output <file> |
Output filename (format auto-detected from extension) | youtube_channels.csv |
--format <fmt> |
Output format (csv, json, xml, sql) |
Auto-detected from extension |
--output-dir <dir> |
Output directory path | Current directory |
--quality <mode> |
Data extraction quality (fast, comprehensive) |
comprehensive |
--encoding <enc> |
Input file encoding | utf-8 |
--verbose |
Enable detailed progress output | false |
--help |
Show help message | - |
# Basic extraction (CSV format)
python bin/extract.py subscriptions.mhtml
# Export to different formats (auto-detected)
python bin/extract.py subscriptions.mhtml --output data.json
python bin/extract.py subscriptions.mhtml --output channels.xml
python bin/extract.py subscriptions.mhtml --output database.sql
python bin/extract.py subscriptions.mhtml --output subscriptions.opml
# Explicit format specification
python bin/extract.py subscriptions.mhtml --output results --format json
# High-quality extraction with custom output
python bin/extract.py subscriptions.mhtml \
--output my_subscriptions.csv \
--quality comprehensive \
--verbose
# Fast extraction for large files
python bin/extract.py large_subscriptions.mhtml \
--quality fast \
--output-dir ./results/youtube-subscription-extractor/
βββ bin/
β βββ extract.py # Main extraction script
βββ scripts/
β βββ install.sh # Unix installation script
β βββ install.bat # Windows installation script
β βββ test.py # Installation verification
βββ examples/
β βββ sample_subscriptions.mhtml # Example MHTML file
β βββ expected_output.csv # Expected extraction result
βββ docs/
β βββ TROUBLESHOOTING.md # Common issues and solutions
β βββ ADVANCED.md # Advanced usage patterns
βββ requirements.txt # Python dependencies
βββ setup.py # Package installation
βββ .gitignore # Git ignore patterns
βββ README.md # This documentation
- Python: 3.7 or higher
- Operating System: Windows 10+, macOS 10.14+, or Linux
- Memory: 512MB RAM minimum (more for large subscription lists)
- Storage: 50MB for dependencies + space for output files
The following Python packages are automatically installed:
- No external dependencies - Uses only Python standard library
- Pure Python - No compiled extensions required
- Lightweight - Minimal resource usage
If automatic installation fails:
All Platforms:
pip install -r requirements.txtPython 3 Specific:
pip3 install -r requirements.txtDevelopment Installation:
pip install -e .# Check file path and permissions
ls -la path/to/subscriptions.mhtml
# Use absolute path
python bin/extract.py /full/path/to/subscriptions.mhtml- Verify file format: Ensure the file is a complete MHTML/Web Archive
- Check subscription visibility: Make sure subscriptions are public on YouTube
- Re-export file: Try saving the YouTube page again with a different browser
# Try different encoding
python bin/extract.py subscriptions.mhtml --encoding utf-8-sig
python bin/extract.py subscriptions.mhtml --encoding latin1# Use comprehensive mode (default)
python bin/extract.py subscriptions.mhtml --quality comprehensive --verbose# Use fast mode for large subscription lists
python bin/extract.py large_file.mhtml --quality fastFor detailed troubleshooting:
python bin/extract.py subscriptions.mhtml --verboseWindows:
- Use Command Prompt or PowerShell as Administrator if needed
- Ensure Python is in your PATH:
python --version - Try:
py bin/extract.pyinstead ofpython bin/extract.py
macOS:
- May need to use
python3instead ofpython - Install Xcode Command Line Tools if needed:
xcode-select --install - For permission issues:
chmod +x scripts/install.sh
Linux:
- Install Python 3 development headers:
sudo apt install python3-dev - For permission issues:
chmod +x scripts/install.sh - Try:
python3 bin/extract.py
- Processing speed: 50-200 channels per second
- Memory usage: 50-200 MB (depends on file size)
- File size support: Up to 50MB MHTML files tested
- Channel count: Up to 1,000+ subscriptions
- File sizes: 1MB to 50MB MHTML files
- Data coverage: 95-100% for properly formatted MHTML files
- Use
--quality fastfor files with 500+ channels - Process large files on systems with adequate RAM
- Use SSD storage for better I/O performance
Analyze your subscription feed for content strategy:
python bin/extract.py my_subscriptions.mhtml --output creator_analysis.csvBuild databases of channels in specific niches:
python bin/extract.py industry_subscriptions.mhtml --output market_research.csvExtract data for YouTube ecosystem studies:
python bin/extract.py research_subscriptions.mhtml \
--output research_data.csv \
--quality comprehensiveCreate spreadsheets of your subscriptions:
python bin/extract.py my_subs.mhtml --output personal_channels.csvimport pandas as pd
# Load extracted data
df = pd.read_csv('youtube_channels.csv')
# Basic statistics
print(f"Total channels: {len(df)}")
print(f"Average subscribers: {df['SubsCountRaw'].mean():,.0f}")
# Top channels by subscriber count
top_channels = df.nlargest(10, 'SubsCountRaw')
print(top_channels[['ChannelName', 'SubscriberCount']])- Open the CSV file in Excel or Google Sheets
- Use pivot tables to analyze subscription patterns
- Create charts from subscriber count data
- Filter by description keywords
-- Import into SQLite
CREATE TABLE channels (
name TEXT,
url TEXT,
image TEXT,
subscribers_formatted TEXT,
subscribers_raw INTEGER,
description TEXT
);
.mode csv
.import youtube_channels.csv channelsThis project helps creators and researchers access their own subscription data. Contributions welcome!
git clone https://github.com/abe238/youtube-subscription-extractor.git
cd youtube-subscription-extractor
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
python scripts/test.py# Run tests with example data
python bin/extract.py examples/sample_subscriptions.mhtml
# Verify output matches expected results
diff output.csv examples/expected_output.csvPlease include:
- Operating system and Python version
- Complete error message
- Sample MHTML file (if possible to share)
- Output from
python bin/extract.py --help
MIT License - see LICENSE file for details.
Intended Use: This tool is designed for extracting data from your own YouTube subscription lists for legitimate purposes such as:
- Personal organization and analysis
- Academic research on social media
- Content strategy development
- Data backup and archival
User Responsibility: Users must comply with:
- YouTube's Terms of Service
- Applicable privacy laws (GDPR, CCPA, etc.)
- Fair use guidelines
- Respect for creator privacy
Data Handling: This tool:
- Processes data locally on your machine
- Does not send data to external servers
- Only extracts publicly visible subscription information
- Does not bypass any privacy settings
The developers are not responsible for how users choose to use this software or any data extracted with it.
Built with:
- Python standard library - for reliable, dependency-free operation
- Real-world testing with diverse subscription lists
- Community feedback and use cases
Inspired by:
- The need for better subscription management tools
- Academic research requirements for social media data
- Content creator analytics needs
Perfect for content creators, researchers, marketers, and anyone who needs to organize and analyze their YouTube subscriptions.
- Support for other social media platforms (Instagram, Twitter, TikTok)
- Built-in data visualization and analytics
- Export to multiple formats (JSON, XML, SQL)
- Automated subscription monitoring and change detection
- Integration with popular analytics platforms
Star this repo if you find it useful! π