v0.8.0
Spider
Changelog:
- v0.8.0 by @cyclone-github in #5
- added flag "-file" to allow creating ngrams from a local plaintext file (ex: foobar.txt)
- added flag "-timeout" for -url mode
- added flag "-sort" which sorts output by frequency
- fixed several small bugs
- https://github.com/cyclone-github/spider/blob/main/CHANGELOG.md
You can also use -file and -sort flags to frequency sort and dedup wordlists that contain dups.
ex: spider -file foobar.txt -sort
This optimizes wordlists by sorting them by probability with the most frequent occurring words being listed at the top.
Keep in mind that using -file and -sort:
- This only applies to wordlists which contain dups
- Sorting large wordlists is RAM intensive
- This feature is beta, so results may vary
Spider is a web crawler and wordlist/ngram generator written in Go that crawls specified URLs or local files to produce frequency-sorted wordlists and ngrams. Users can customize crawl depth, output files, frequency sort, and ngram options, making it ideal for web scraping to create targeted wordlists for tools like hashcat or John the Ripper. Spider combines the web scraping capabilities of CeWL and adds ngram generation, and since Spider is written in Go, it requires no additional libraries to download or install.
Spider just works.
Spider: URL Mode
spider -url 'https://forum.hashpwn.net' -crawl 2 -delay 20 -sort -ngram 1-3 -timeout 1 -o forum.hashpwn.net_spider.txt
----------------------
| Cyclone's URL Spider |
----------------------
Crawling URL: https://forum.hashpwn.net
Base domain: forum.hashpwn.net
Crawl depth: 2
ngram len: 1-3
Crawl delay: 20ms (increase this to avoid rate limiting)
Timeout: 1 sec
URLs crawled: 56
Processing... [====================] 100.00%
Unique words: 3164
Unique ngrams: 17313
Sorting n-grams by frequency...
Writing... [====================] 100.00%
Output file: forum.hashpwn.net_spider.txt
RAM used: 0.03 GB
Runtime: 8.634s
Spider: File Mode
spider -file kjv_bible.txt -sort -ngram 1-3
----------------------
| Cyclone's URL Spider |
----------------------
Reading file: kjv_bible.txt
ngram len: 1-3
Processing... [====================] 100.00%
Unique words: 35412
Unique ngrams: 877394
Sorting n-grams by frequency...
Writing... [====================] 100.00%
Output file: kjv_bible_spider.txt
RAM used: 0.13 GB
Runtime: 1.359s
Wordlist & ngram creation tool to crawl a given url or process a local file to create wordlists and/or ngrams (depending on flags given).
Usage Instructions:
- To create a simple wordlist from a specified url (will save deduplicated wordlist to url_spider.txt):
spider -url 'https://github.com/cyclone-github'
- To set url crawl url depth of 2 and create ngrams len 1-5, use flag "-crawl 2" and "-ngram 1-5"
spider -url 'https://github.com/cyclone-github' -crawl 2 -ngram 1-5
- To set a custom output file, use flag "-o filename"
spider -url 'https://github.com/cyclone-github' -o wordlist.txt
- To set a delay to keep from being rate-limited, use flag "-delay nth" where nth is time in milliseconds
spider -url 'https://github.com/cyclone-github' -delay 100
- To set a URL timeout, use flag "-timeout nth" where nth is time in seconds
spider -url 'https://github.com/cyclone-github' -timeout 2
- To create ngrams len 1-3 and sort output by frequency, use "-ngram 1-3" "-sort"
spider -url 'https://github.com/cyclone-github' -ngram 1-3 -sort
- To process a local text file, create ngrams len 1-3 and sort output by frequency
spider -file foobar.txt -ngram 1-3 -sort
- Run
spider -helpto see a list of all options
4c80bc2f26e9ebd9445bac46315868dde8ba38374db4ef9c770c066ccc43a091 spider_amd64.bin
9ca7048f7b18ca3502fe84b1a2654a6d0ab23ca4a54996d90223a62f1bf4ca23 spider_arm64.bin
49bfab2856bfc95d8744e89ebacbe17f69ed04287f499640cea3564115931d34 spider_arm.bin
c4a6aa4de95ed3522f3a2e731eefabda55060a971d72094059b01ee118c1cff7 spider_amd64.exe
Jotti Antivirus Scan Results
Antivirus False Positives:
- Several antivirus programs on VirusTotal incorrectly detect Go compiled binaries as a false positive. This issue primarily affects the Windows executable binary, but is not limited to it. If this concerns you, I recommend carefully reviewing the source code, then proceed to compile the binary yourself.
- Uploading your compiled binaries to https://virustotal.com and leaving an upvote or a comment would be helpful as well.