Skip to content

w1lkns/diskcomp

Repository files navigation

diskcomp

CI PyPI version Python versions License: MIT GitHub release Platform support Standalone binaries

Find and safely delete duplicate files β€” across two drives or within one. Zero dependencies, cross-platform, with undo.

πŸ“‹ View the Roadmap β€” See what's planned for future versions

✨ Key Features

  • πŸ” Smart Detection β€” SHA256 hashing finds true duplicates regardless of filename
  • ⚑ Performance β€” Two-pass scan: filter by size first, hash only size-collision candidates
  • πŸ›‘οΈ Safety First β€” Always ask before deleting, create undo logs, detect read-only files
  • πŸ–₯️ Cross-Platform β€” macOS, Linux, Windows with native progress bars (Rich UI + ANSI fallback)
  • πŸ“Š Rich Reports β€” CSV/JSON output with file paths, sizes, hashes, and deletion recommendations
  • 🎯 Flexible Modes β€” Compare two drives, clean single drive, interactive deletion, batch operations
  • βš™οΈ Zero Dependencies β€” Pure Python, optional Rich UI, works everywhere Python runs
  • πŸ“¦ Multiple Install Options β€” pip, pipx, standalone binaries (Homebrew coming in v1.1)

πŸ“Š Project Status

diskcomp 1.0.0 is production-ready and actively maintained. The core deduplication engine has been tested with 285 comprehensive tests covering edge cases, cross-platform compatibility, and error handling.

  • βœ… Feature Complete β€” All planned v1.0 features implemented
  • βœ… Well Tested β€” 285 tests, CI on 3 platforms Γ— 3 Python versions
  • βœ… Production Ready β€” Used for real data cleanup with safety guarantees
  • βœ… Cross-Platform β€” Native builds for macOS, Linux, Windows
  • βœ… Multiple Distribution Channels β€” PyPI, GitHub Releases (Homebrew coming in v1.1)

Quick Install

Download binary (no Python required):

macOS:

# Direct download (recommended)
curl -L -o diskcomp https://github.com/w1lkns/diskcomp/releases/latest/download/diskcomp-macos
chmod +x diskcomp
./diskcomp --help

# Homebrew (coming in v1.1)
# brew tap w1lkns/diskcomp
# brew install diskcomp

Linux:

# Download directly  
curl -L -o diskcomp https://github.com/w1lkns/diskcomp/releases/latest/download/diskcomp-linux
chmod +x diskcomp
./diskcomp --help

Windows:

# Download diskcomp-windows.exe from GitHub Releases
# https://github.com/w1lkns/diskcomp/releases/latest
diskcomp-windows.exe --help

Python install (if you have Python):

pipx (recommended β€” handles PATH automatically):

pipx install diskcomp
diskcomp --help

Don't have pipx? brew install pipx on macOS, pip install pipx elsewhere.

pip install:

pip install diskcomp
diskcomp --help

Single-file version (no install, no dependencies):

curl -O https://raw.githubusercontent.com/w1lkns/diskcomp/main/diskcomp.py
python3 diskcomp.py --help

Quick Start

Interactive mode (no arguments β€” clears screen, shows menu):

diskcomp

The launch menu offers:

  1) Compare two drives
  2) Clean up a single drive
  3) Load previous report
  4) Help
  5) Quit

Compare two drives (command-line):

diskcomp --keep /Volumes/backup --other /Volumes/external

Clean up a single drive (find internal duplicates):

diskcomp --single /Volumes/my-drive

Dry-run (count files without hashing):

diskcomp --keep /path/A --other /path/B --dry-run

Load a previous report (skip re-scanning):

diskcomp --delete-from ./diskcomp-report-20260322-235800.csv

πŸ“Š Example Output

Interactive mode startup:

 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•—  β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
 β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—
 β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•
 β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ•β•β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•
 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•—β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β•šβ•β• β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘
 β•šβ•β•β•β•β•β• β•šβ•β•β•šβ•β•β•β•β•β•β•β•šβ•β•  β•šβ•β• β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β• β•šβ•β•     β•šβ•β•β•šβ•β•

 Find duplicates. Free space. Stay safe.
 v1.0.0

What would you like to do?
  1) Compare two drives
  2) Clean up a single drive  
  3) Load previous report
  4) Help
  5) Quit

Progress display:

Drive Health: Keep=/Volumes/Photos (2TB APFS), Other=/Volumes/Backup (4TB NTFS)
Scanning: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 1,847 files found
Hashing candidates: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 234/234 files (23.4 MB/s)

Found 42 duplicates. You could free 1.2 GB from /Volumes/Backup. Ready to review?

πŸ›‘οΈ Safety Guarantees

Interactive mode startup:

 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•—  β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
 β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—
 β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•
 β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ•β•β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•
 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•—β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β•šβ•β• β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘
 β•šβ•β•β•β•β•β• β•šβ•β•β•šβ•β•β•β•β•β•β•β•šβ•β•  β•šβ•β• β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β• β•šβ•β•     β•šβ•β•β•šβ•β•

 Find duplicates. Free space. Stay safe.
 v1.0.0

What would you like to do?
  1) Compare two drives
  2) Clean up a single drive  
  3) Load previous report
  4) Help
  5) Quit

Progress display:

Drive Health: Keep=/Volumes/Photos (2TB APFS), Other=/Volumes/Backup (4TB NTFS)
Scanning: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 1,847 files found
Hashing candidates: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 234/234 files (23.4 MB/s)

Found 42 duplicates. You could free 1.2 GB from /Volumes/Backup. Ready to review?

Your files are safe. diskcomp prioritizes safety over convenience:

  • πŸ”’ No Automatic Deletion β€” Every file deletion requires explicit user confirmation
  • πŸ“ Undo Logs β€” Complete audit trail written before any file is deleted
  • ⚠️ Read-Only Detection β€” Automatically detects and warns about read-only drives
  • πŸ” Dry-Run Mode β€” Preview operations without any file system changes
  • ⏹️ Abort Anytime β€” Press Ctrl+C at any prompt to stop safely
  • ✨ Interactive Mode β€” Review each file individually before deletion
  • πŸ” SHA256 Verification β€” Cryptographic hashing ensures only true duplicates are identified

Usage & Flags

Flag Description Example
--keep PATH Path to the "keep" drive (files to retain). Required unless interactive. --keep /Volumes/backup
--other PATH Path to the "other" drive (duplicates deleted from here). Required unless interactive. --other /Volumes/external
--single PATH Scan one drive for internal duplicates (redundant copies on the same drive). --single /Volumes/photos
--dry-run Walk and count files without hashing (quick preview). --dry-run
--limit N Hash only first N files per drive (testing only). --limit 100
--output PATH Custom report path (default: ~/diskcomp-report-YYYYMMDD-HHMMSS.csv). --output ./my-report.csv
--format csv|json Report format: csv or json (default: csv). --format json
--min-size SIZE Minimum file size to include (default: 1KB). Accepts bytes, KB, MB, GB. --min-size 10MB
--delete-from PATH Load an existing report and start deletion workflow (skip re-scanning). --delete-from ./diskcomp-report-20260322.csv
--undo PATH View the audit log of a previous deletion session. --undo ./diskcomp-undo-20260322.json

How It Works

  1. Drive Health Checks (pre-scan, two-drive mode):

    • Space summary for both drives
    • Filesystem detection (HFS+, NTFS, ext4, exFAT, etc.)
    • Read-only detection (warns if "keep" drive is read-only)
    • Read speed benchmark (128MB)
    • Optional SMART data (if smartmontools available)
  2. Scanning & Hashing:

    • Walks drives recursively
    • Skips OS noise (.DS_Store, Thumbs.db, System Volume Information, etc.)
    • Two-pass optimization: size-filter candidates first, then SHA256 hash
    • Live progress bar with speed and ETA
  3. Reporting:

    • CSV or JSON report saved to ~/diskcomp-report-YYYYMMDD-HHMMSS.{csv,json}
    • Atomic writes (temp β†’ rename, safe against crashes mid-write)
  4. Deletion Workflow (optional):

    • Mode A (Interactive): Shows both copies numbered (1) and (2) β€” you pick which to delete, skip, or abort. Running space freed shown after each deletion.
    • Mode B (Batch): Dry-run preview with file type breakdown β†’ type DELETE to confirm β†’ progress bar
    • Undo log written before each deletion (audit-first pattern)
    • Always abortable with Ctrl+C
    • Can re-run from a saved report without re-scanning (option 3 in menu or --delete-from)
  5. Undo Log (--undo flag):

    • JSON file listing all deleted files with paths, sizes, hashes, and timestamps
    • Deletion is permanent β€” the log is an audit trail, not a restore mechanism

Reports

CSV format (default, spreadsheet-friendly):

status,original_file,duplicate_file,size_mb,verification_hash
DELETE_FROM_OTHER,/Volumes/keep/photos/pic1.jpg,/Volumes/other/photos/pic1.jpg,2.5,abc123...
UNIQUE_IN_KEEP,/Volumes/keep/docs/resume.pdf,,0.1,def456...
UNIQUE_IN_OTHER,,/Volumes/other/temp/junk.tmp,5.0,ghi789...
Column Values
status DELETE_FROM_OTHER, UNIQUE_IN_KEEP, UNIQUE_IN_OTHER
original_file Path to the copy to keep
duplicate_file Path to the copy to delete
size_mb File size in MB
verification_hash SHA256 hex string

JSON format (programmatic use):

diskcomp --keep /Volumes/keep --other /Volumes/other --format json

Known Limitations

NTFS Drives on macOS and Linux

NTFS (Windows filesystem) drives are read-only on macOS and Linux by default:

  • diskcomp can scan and identify duplicates on NTFS drives
  • diskcomp cannot delete files from NTFS drives without a third-party driver

Workaround:

diskcomp detects this and warns during health checks.

Optional Enhancements

Rich library β€” professional progress bars and color styling:

pip install diskcomp[rich]

smartmontools β€” enables SMART data display:

  • macOS: brew install smartmontools
  • Linux: apt-get install smartmontools or pacman -S smartmontools
  • Windows: wmic logicaldisk (built-in, no install needed)

Without these, diskcomp uses ANSI progress bars and skips SMART data.

Cross-Platform Testing

CI validates diskcomp on 9 combinations:

  • macOS (latest) Γ— Python 3.8, 3.10, 3.12
  • Linux (Ubuntu latest) Γ— Python 3.8, 3.10, 3.12
  • Windows (latest) Γ— Python 3.8, 3.10, 3.12

All tests pass and the single-file build is verified on each combination.

Development

Run tests locally:

python -m pytest tests/

Generate single-file version:

python build_single.py
python diskcomp.py --help

🀝 Support & Contributing

⭐ Like diskcomp? Star it on GitHub to show support!

License

MIT β€” See LICENSE file for details.

About

Cross-platform CLI to compare external drives, find duplicates, and safely free space

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages