A Python tool to crawl websites and gather all valid URLs.
You can install the package after building or from PyPI:
pip install link-finderOnce installed, you can use it from the command line as follows:
gather_urls -u https://example.comThis will crawl the website starting from the provided URL and return all valid URLs it finds, including those from the sitemap (if available).
- Crawls a website starting from the base URL.
- Parses sitemaps for additional URLs.
- Filters out invalid or unsupported file types (e.g., images, PDFs).
The following packages are required and will be installed automatically:
- hrequests
- usp
- courlan