Skip to content

r-salas/oshash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

oshash

OpenSubtitles Hash implementation.

This algorithm is focused on speed because unlike other algorithms, OSHash doesn't read the whole file. This makes it a perfect algorithm for hashing large files.

Installation

The latest stable release can be installed from PyPI:

$ pip install oshash

API usage

Simply import oshash and call oshash function with your file path.

import oshash

file_hash = oshash.oshash("/path/to/file")

Command usage

You can compute OSHash directly from the terminal.

$ oshash <file_path>

For example:

$ oshash /path/to/video.mp4
OSHash (/path/to/video.mp4) = d315edebf53a4af3

Comparison

Below we can see a small graph comparing the hashing speed (in seconds) of OSHash with other algorithms for two different files:

320p video (61.7 MB) 1080p video (339.4 MB)

You can create a comparison for any file with the following command:

$ python3 scripts/compare_algorithms.py <file_path>

If you want to view graphics, make sure you have matplotlib installed.

How It Works?

In pseudo-code, the hash is computed in the following way:

file_buffer = open("/path/to/file/")

head_checksum = checksum(file_buffer.head(64 * 1024))  # 64KB
tail_checksum = checksum(file_buffer.tail(64 * 1024))  # 64KB

file_hash = file_buffer.size + head_checksum + tail_checksum

You can read more in OpenSubtitles.org Wiki

Acknowledgements

Thanks to the OpenSubtitles.org team for this algorithm.

About

OpenSubtitles Hash implementation

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages