-
Notifications
You must be signed in to change notification settings - Fork 0
A predictive analytics project for ranking of health-violation risk in Wake County restaurants. For questions, please e-mail [email protected]
guyawn/FoodInspections
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
ECE 590-13: Food Inspections Prediction using SoTA NLP ------------------------------------------ This branch represents an extention of the original project, with work performed and presented for the final project of the Text Data and Analytics Course of the Duke ECE Masters. The starting pont for this data was catalog of all restaurants in Wake County, North Carolina, found at https://catalog.data.gov/dataset/restaurants-in-wake-county-yelp. Using the Yelp Business search API, these resetaurants were queried to acquire a large account of the information available to Yelp users. This includes location data, review counts, pricing, category, and the text of the three most recent reviews. Data were cleaned to remove non-restaurants, as well as any duplicates identified during the API pulls. Categories were one-hot encoded (note, some restaurants have listed multiple categories). Reviews were transformed into a bag-of-words representation, and the first 1000 features were selected. Note that both the cateogires and reviews result in fairly sparse representations. The scripts in the main directory will create the Wake county dataset. To run 5_EmbedNC, you'll need to run the scripts in the MakeEmbeddings directory as well. Scripts should be run in the specified order; note that running them will take a couple days (to pull all of the data from yelp), since you'll hit the rate limit during the data pull. Files 6 and 7 represent the modelling. These require running on a GPU. I used Google Colab to do so; colab only accepts Jupyter notebooks, hence why these are not standard python files. Contact Me ---------- If you have any additional questions about the data or the processes used to build them, please e-mail me at [email protected].
About
A predictive analytics project for ranking of health-violation risk in Wake County restaurants. For questions, please e-mail [email protected]
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published