Overfitting

Overfitting is when you make your supervised Learning model to focused on the training data that you have. When this happens your model will be great at predicting your training data but it will suck at predicting data that you have not seen before especially if it is close to the decision bounary.

This can generally be seen in big difference in training scores and test scores where training scores are high and test scores a low.

Pasted image 20220216115427

Pasted image 20220216115205

Preventing overfitting

You can prevent overfitting by splitting your data into different groups. You take the biggest part of the data as the training data. The data that the model actually uses to make predictions. Then you have the test data or development data which you use to access the models' performance during training. When this is done you need to have a thrid set of data the validation set to check if you didn't overfit on the test data.

In the knolegde clip he calls it differently. See the image.

Pasted image 20220216120004

K-fold cross validation

Here you split the data into k portions of the same size, then iteratively train on k-1 sub sets and test on the remaining sub-set. Then you average scores of the k runs. K is usally like 5 or 10 he says.

Disclaimer

Although I have tried my best to make sure this summary is correct, I will take no responsibility for mistakes that might lead to you having a lower grade.

Issues

If you see anything that you think might be wrong then please create an issue on the Github repository or even better, create a pull request 😄

Support

Do you appreciate my summaries, and you want to thank me then you can support me here:

Tikkie qr code valid till april 4

Every model is wrong, but some models are usefull.

There are 4 topics:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overfitting

Overfitting

Preventing overfitting

K-fold cross validation

Disclaimer

Issues

Support

Clone this wiki locally