Yet another dataset about Movies, TV Shows and Games.
This is implementation of Criticker Dataset. This repository contains the necessesary spiders for dataset creation alongside with some basic tests.
great_expectations tool is used for Data Quality purposes, check here the datadocs
poetry module is used for virtual environment and dependency management
poetry installpoetry run scrapy crawl games_spider -o data/raw/games.csv # to retrieve games
# export login username and password
export C_USERNAME='<USERNAME>'
export C_PASSWORD='<PASSWORD>'
poetry run scrapy crawl movies_spider -o data/raw/movies.csv # to retrieve moviespoetry run pytest- Add games
- TCI related data
- Add reviews
