06. Training and test sets

Jump to bottom

Antonio Erdeljac edited this page Feb 26, 2019 · 1 revision

Training and test sets

Topic: Training and test sets

Course: GMLC

Date: 17 February 2019

Professor: Not specified

Resources

Key Points

In case of having a single data set, it is recommended to split the data set into a test and training set
It is important to remove duplicates and randomize the data set before splitting, otherwise we might get a falsely low loss (accidentally train on test data)
Test data characteristics
- Large enough to yield meaningful results
- Is representative of the dataset as a whole (randomize before splitting)

Check your understanding

Know test data characteristics
Know what is important before splitting the data

Summary of Notes

We can create 2 subsets from one dataset by splitting it
It is important to randomize & remove duplicates before splitting