This week we are going to learn about a few ideas that will allow you to execute the development part of your thesis projects more successful. We are going to learn to about a few tools, namely,
- Bash commands
- Git commands
- Doing hyperparameter tuning
- Logging relevant information when training models
There are several learning goals behind this week's exercise. You will familiarise with how to:
- Use github to manage your code
- Use
bash
commands andgit
commands to add new changes to your code while keeping track of them. - Training a machine learning model in your personal computer
- Adding hyperparameter tuning to your code
- Logging the hyperparameters and metrics to track your experimental settings.
Today's exercise consists of several steps that need to be executed sequencially to obtain the final code that will conduct hyperparamter tuning for your machine learning model.
- Step 1: Forking a public github repository to your own git profile
- Step 2: Pulling the git repository to your local environment
- Step 3: Do the instructed changes to your code to introduce hyperparameter tuning
- Step 4: Add logging to record the hyperparamters and the metrics
- Step 5: Commit the changes and push them to a remote git repository.
- Step 6: Submit a pull request
- Sign into your github account or register with github over here
- Fork the tutorial code found here to create your own version of the repository.
- Firstly, install Git on your local machine. You can find instructions for doing this here.
- Then go to your desire folder and create a directory named
hyperparameter_tuning_tutorial
using bash commands. - Use the relevant git command to
clone
the repository into the directory you just created. - Finally, create a new branch named
hyperparams_and_logging
from themaster
branch of your repository before doing changes to the code.
- Open this notebook using the Jupyter notebook software.
- Instructions on installing Anaconda that contains Jupyter notebook editor is found here
There are several Python modules that we we aim to use today. Let us import them here. We use numpy and pandas for data manipulation. We use scikit learn for splitting data into train and test splits and implement grid search. We use xgboost to implement the desired machine learning model. We use logging library to log information.
Hint: You may use the pip
python package manager to install these libraries in your local python environment.
pip install numpy
pip install pandas
pip install scikit-learn
pip install xgboost
- Instead of printing the best set of hyperparameter values, use logging to add log messages that can capture these values.
- Use the relevant git command to commit the code into the local version of your repository
- Use the relevant git command to push the committed code into the
hyperparams_and_logging
branch of the remote repository (in github)
- Use the github web user interface to submit a pull request to your repository. Details found here
Hint: Details about pushing the new code to the branch is also found here