Model Selection and Performance Boosting with k-Fold Cross Validation and XGBoost


Summary

Splitting the dataset into a Training set and Test set is one of the basic and essential steps when it comes to test how our machine learning model is performing on a dataset. This helps us determine if the model is able to predict the results or the outcomes. To determine if our model is a best fit for a given data set (that is if it is overfitting or underfitting) then we need to test it on an unseen dataset or a validation dataset. Cross validation is just a technique that sets aside part of our dataset (validation