-
Notifications
You must be signed in to change notification settings - Fork 34
06. Training and test sets
Topic: Training and test sets
Course: GMLC
Date: 17 February 2019
Professor: Not specified
-
https://developers.google.com/machine-learning/crash-course/training-and-test-sets/video-lecture
-
https://developers.google.com/machine-learning/crash-course/training-and-test-sets/splitting-data
-
In case of having a single data set, it is recommended to split the data set into a test and training set
-
It is important to remove duplicates and randomize the data set before splitting, otherwise we might get a falsely low loss (accidentally train on test data)
-
Test data characteristics
-
Large enough to yield meaningful results
-
Is representative of the dataset as a whole (randomize before splitting)
-
-
Know test data characteristics
-
Know what is important before splitting the data
-
We can create 2 subsets from one dataset by splitting it
-
It is important to randomize & remove duplicates before splitting