Training models

In the previous section, we've seen how to load and preprocess data in ML.NET. In this section, we'll continue working on our Github issue classifier by training the model.

We'll cover the following topics:

Adding a trainer to the machine learning pipeline
Training a model

Let's start extending the machine learning pipeline from the previous section with a trainer.

Adding a trainer to the machine learning pipeline

Remember that in the previous section, we had created a new machine learning pipeline with preprocessing steps. Now it's time to extend the pipeline with a trainer for a classification model.

Open up the Program.cs file from the Trainer project and add the following code to the end of the Main method:

var trainer = mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy("Label", "Features");

var trainingPipeline = pipeline.Append(trainer).Append(
    mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

The code does the following:

Create a new trainer that uses the SDCA algorithm with maximum entropy to create a multi-class classification model.
Append the trainer to the pipeline and add a mapping for the predicted label so we can access it in the output.

Now that we have a training pipeline let's run it and train a model.

Training a model

Now that we have a training pipeline, let's feed it some data to train the model. Add the following code to the end of the Main method in the Program.cs file:

var trainedModel = trainingPipeline.Fit(dataView);

The code invokes the fit method on the pipeline using the data view we created earlier. The output of the method is a trained model that we can store on disk using the following code:

using (var outputStream = File.OpenWrite("../Website/GithubClassifier.zip"))
{
    mlContext.Model.Save(trainedModel, dataView.Schema, outputStream);
}

This code performs the following steps:

We open a new stream to the GithubClassifier.zip file in the website project.
Next, we're saving the model with the right schema to the output stream.

When you run the application, you'll notice that it generates a GithubClassifier.zip file in the website project. This file contains the trained model, including schema metadata.

In a later section, we'll use the model in the ASP.NET Core website project.

Summary

In this section of the tutorial, we've worked on getting a training pipeline. Once we had a training pipeline, we used it to train a model and save it to disk.

In the next section, we'll expand the code a little bit to include validation logic to test whether the model is working correctly.

Next section

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Training models

Adding a trainer to the machine learning pipeline

Training a model

Summary

Files

README.md

Latest commit

History

README.md

File metadata and controls

Training models

Adding a trainer to the machine learning pipeline

Training a model

Summary