Skip to content

06. Training and test sets

Antonio Erdeljac edited this page Feb 26, 2019 · 1 revision

Training and test sets


Topic: Training and test sets

Course: GMLC

Date: 17 February 2019

Professor: Not specified


Resources


Key Points


  • In case of having a single data set, it is recommended to split the data set into a test  and training  set

  • It is important to remove duplicates and randomize the data set before splitting, otherwise we might get a falsely low loss (accidentally train on test data)

  • Test data characteristics

    • Large enough to yield meaningful results

    • Is representative of the dataset as a whole (randomize before splitting)

Check your understanding


  • Know test data characteristics

  • Know what is important before splitting the data

Summary of Notes


  • We can create 2 subsets from one dataset by splitting it

  • It is important to randomize & remove duplicates before splitting

Clone this wiki locally