index.idyll

[meta title:"Final Project" description:"Short description of your project" /]

[Header
  fullWidth:true
  title:"Machine Learning Classification"
  authors:`[{
    name: "Laura Howard",
    link: "https://www.linkedin.com/in/laura-m-howard/"
  }, {
    name: "Christian Deverall",
    link: "https://www.linkedin.com/in/christian-deverall-cmu/"
  },
  {
    name: "Nathan Jen",
    link: "https://www.linkedin.com/in/nathanjen/"
  },
  {
    name: "Juliette Wong",
    link: "https://www.linkedin.com/in/juliettenwong/"
  }]`
  date:`(new Date()).toDateString()`
  background:"#222222"
  color:"#ffffff"
/]

[Scroller currentStep: intro]
  [Step]
    # A Brief History of Classification

    [aside]
      [img src:"static/images/machine_learning_cartoon.png" /]
    [/aside]

    [p className:'introSubtitle']Can computers learn from data?[/p]

    This question led to the development of machine learning.  In 1959, Arthur Samuel (a pioneer computer scientist) 
    defined machine learning as “the field of study that [b] gives computers the ability to learn without being explicitly programmed[/b]”.

    While this may sound complex, we hope to make some machine learning concepts approachable and easy to learn 
    through examples and activities in this article.
  [/Step]

  [Step]
    [p className:'introSubtitle']Understanding Algorithms[/p]

    [aside]
      [img src:"static/images/ml_books.png" /]
    [/aside]
    
    Think of it this way – as you can learn these concepts through experimenting with examples, computers can do the same!

    To grasp how machine learning works, we need to understand [b]algorithms - the method for creating models from data[/b].  
    In this article, we will show you popular algorithms that perform classification tasks on a simple dataset related to income.  
  [/Step]
[/Scroller]

[var name:"case" value:0 /]
[Scroller currentStep: case]
    [Graphic className:"d3-component-container"]
      [img src:"static/images/students.png" /]
    [/Graphic]
  [Step]
    # Case Study 

    [p className:'introSubtitle']Is Income Predictable?[/p]
    
    You are starting your senior year of high school, and are [b]trying to decide if you will further your education beyond graduation[/b].
    You are wondering if you will see high enough returns in the long run that will allow you to live a comfortable life and pay off student loans.
    To do this, you look at US census data, and gather real examples of peoples:  

    [p className:'numbered-list']
    1. Age [br/]
    2. Number of years of education [br/]
    3. What their income is [br/]
    [/p]

    You classify income as high if it exceeds $50,000 (Data is from the 1994 US census bureau database. $50,000 is worth ~$88,000 in 2020).

    Lets take a quick look at the data below: 

    [data name:"train" source:"train_data.csv" /]
    [Table data:train defaultPageSize:8/]
  [/Step]
[/Scroller]

[Scroller currentStep: eda]
  [Step]
    [p className:'introSubtitle']Exploring the dataset[/p]

    As you probably realized, [b] looking at raw data isn't very useful.[/b] So now lets look at visualizations to see what
    more we can learn from this dataset!

    [IdyllVegaLite data:train spec:`{
      renderer: 'svg',
      mark: "bar",
      encoding: {
      y: {field: "High Income"},
      x: {aggregate: "count", title: "Number of People with High Income"},
      color: {field: "High Income", type: "nominal"},
      }
    }`/]

    First, let’s look at a simple bar chart showing the number of people with high income and without. 
    As you can see here, there is an equal number of people (16) with high and low income in the dataset, 
    totaling 32 records.

    [aside]
      Color this chart by:
      [var name:"colorBy" value:`{value: "#696969"}` /]
      [button onClick:`colorBy = {field: 'High Income', type: 'nominal'}` ]High Income[/button]
      [button onClick:`colorBy = {value: "#696969"}` ]None[/button]
    [/aside]
    [IdyllVegaLite data:train spec:`{
      renderer: 'svg',
      mark: "bar",
      encoding: {
      x: {field: "Age", bin: true},
      y: {aggregate: "count", title: "Number of People"},
      color: colorBy
    }
    }` /]

    Next, let’s look at the spread of peoples’ ages. When the high income button is selected to the right, you can see how old 
    people who have a high income are.

    [Conditional if:`colorBy.field`]
      As you can see, it seems as if [b]older people are more likely to be high income[/b]. This makes 
      sense as more experience usually means a better salary!
    [/Conditional]
    
    [aside]
      Color this chart by:
      [var name:"colorBy2" value:`{value: "#696969"}` /]
      [button onClick:`colorBy2 = {field: 'High Income', type: 'nominal'}` ]High Income[/button]
      [button onClick:`colorBy2 = {value: "#696969"}` ]None[/button]
    [/aside]
    [IdyllVegaLite data:train spec:`{
      renderer: 'svg',
      mark: "bar",
      encoding: {
      x: {field: "Years of Education", axis: {labelAngle: 0}},
      y: {aggregate: "count", title: "Number of People"},
      color: colorBy2
    }
    }` /]

    This graph shows the amount of people by years of education. When the high income button is selected, you 
    can see how many years of education people with a high income have.

    [Conditional if:`colorBy2.field`]
      As you can see from this graph, those with [b]high income tend to be the ones with more years of education[/b].
      However, [b]this is not always the case[/b], as you will see some low income individuals 
      with 9, 10, 11, and 13 years of experience. 
    [/Conditional]


    [IdyllVegaLite data:train spec:`{
      renderer: 'svg',
      mark: "circle",
      encoding: {
        x: {field: "Age", type: "quantitative"},
        y: {field: "Years of Education", type: "quantitative"},
        color: {field: "High Income", type: "nominal"},
        tooltip: {field: "High Income", type: "nominal"}
      }
    }` /]

    Finally, let’s look at the age and education variables together in a scatterplot.

    After looking at the data, do you think more years of schooling truly result in a higher income for you?
  [/Step]
[/Scroller]

  
[Scroller currentStep: ml]
    [Graphic className:"d3-component-container"]
      [img src:"static/images/mystery-person.png" /]
    [/Graphic]

  [Step]
    [p className:'introSubtitle']Evaluating Your Model[/p]
    After exploring the data, we hope that you got a sense of which patterns are more likely
    to mean that someone is high income or not. By doing this, [b]you were actually training your own model[/b], which is 
    almost exactly what machine learning models do! However, now that your model is trained, lets put it to the test
    and have you predict the income of 3 mystery people.

    [div className:'person-section']
      [b]Mystery Person #1:[/b] [br/]
      Age: 57 [br/]
      Years of Education: 15

      [var name:"correct1" value:`false` /]
      [var name:"show1" value:`false` /]
      [div className:'btn-row' onClick:`show1 = true`]
        [button className:'evaluate-btn' onClick:`correct1 = true`]High Income[/button]
        [button className:'evaluate-btn' onClick:`correct1 = false`]Low Income[/button]
      [/div]
      [Answer correct:correct1 show:show1]
        This mystery person is both on the older side and well educated. As you can see in the graph below, s/he is surrounded 
        by high income individuals, so there is a high likelihood that this person follows suit. [br/]
        
        [data name:"mystery_one" source:"mystery1.csv" /]
        [IdyllVegaLite data:mystery_one spec:`{
          renderer: 'svg',
          mark: "circle",
          encoding: {
            x: {field: "Age", type: "quantitative"},
            y: {field: "Years of Education", type: "quantitative"},
            color: {field: "High Income", type: "nominal", scale: { domain: ["High", "Low", "Test"], range: ["#ff7f0f", "#1e77b4", "#000000"]}}
          }
        }` /]
      [/Answer]

    [/div]

    [div className:'person-section']
      [b]Mystery Person #2:[/b] [br/]
      Age: 22 [br/]
      Years of Education: 7

      [var name:"correct2" value:`false` /]
      [var name:"show2" value:`false` /]
      [div className:'btn-row' onClick:`show2 = true`]
        [button onClick:`correct2 = false`]High Income[/button]
        [button onClick:`correct2 = true`]Low Income[/button]
      [/div]

      [Answer correct:correct2 show:show2]
         This mystery person is both on the younger side and has a less total years of education. As you can see in the graph below, 
         s/he is surrounded by low income individuals, so there is a high likelihood that this person follows suit.

        [data name:"mystery_two" source:"mystery2.csv" /]
        [IdyllVegaLite data:mystery_two spec:`{
          renderer: 'svg',
          mark: "circle",
          encoding: {
            x: {field: "Age", type: "quantitative"},
            y: {field: "Years of Education", type: "quantitative"},
            color: {field: "High Income", type: "nominal", scale: { domain: ["High", "Low", "Test"], range: ["#ff7f0f", "#1e77b4", "#000000"]}}           
          }
        }` /]
      [/Answer]
    [/div]

    [div className:'person-section']
      [b]Mystery Person #3:[/b] [br/]
      Age: 30 [br/]
      Years of Education: 10

      [var name:"correct3" value:`false` /]
      [var name:"show3" value:`false` /]
      [div className:'btn-row' onClick:`show3 = true`]
        [button className:'evaluate-btn' onClick:`correct3 = false`]High Income[/button]
        [button className:'evaluate-btn' onClick:`correct3 = true`]Low Income[/button]
      [/div]
      [Answer correct:correct3 show:show3]
        This mystery person is middle aged, and has a medium number of years of education. 
        This was a tough question, as this person's age and education years are not clearly on one side of the spectrum of indicators of income. 
        Sometimes machine learning can lead to a wrong classification as patterns are not always clear-cut.

        [data name:"mystery_three" source:"mystery3.csv" /]

        [IdyllVegaLite data:mystery_three spec:`{
          renderer: 'svg',
          mark: "circle",
          encoding: {
            x: {field: "Age", type: "quantitative"},
            y: {field: "Years of Education", type: "quantitative"},
            color: {field: "High Income", type: "nominal", scale: { domain: ["High", "Low", "Test"], range: ["#ff7f0f", "#1e77b4", "#000000"]}}
          }
        }` /]
      [/Answer]
    [/div]
  [/Step]
[/Scroller]


[Scroller currentStep: ml]
    [Graphic className:"d3-component-container"]
      [img src:"static/images/training.png" /]
    [/Graphic]
  [Step]
    # Using Machine Learning

    We will now complete the same classifying task with three commonly used classification algorithms:  

      [a href:"#knn"]1. K Nearest Neighbors (KNN)[/a]  

      [a href:"#decisionTrees"]2. Decision Trees[/a]  

      [a href:"#logreg"]3. Logistic Regression[/a]  

  [/Step]

  [Step]
    [p className:'introSubtitle']Training[/p]
    
    Before we get into the algorithms, we need to understand why models need training. 
    Training a model simply means teaching the model good values for a set of examples.  
    Each example will help the model to understand the relationship between the variables 
    (in this case, age and education years) and the classification value (income). 
    Once the relationship is understood, the model will be able to predict an unknown classification value for new sets of variables.

  [/Step]

  [Step]
    [img src:"static/images/warning3.gif" /]
    [p className:'introSubtitle']Warning: Overfitting[/p]

    One thing to pay attention to when using machine learning algorithms is overfitting.  
    Overfitting happens when a model is too complex and starts to classify according 
    to a random error in the data over the expected relationships between variables.  
    A model is considered [b]“overfit” when it fits your training data really well, yet performs 
    poorly on new data[/b]. 

    One way to identify an “overfit” is to reserve a portion of your data 
    set and introduce it after you are finished creating your model to see how it performs. If you model
    performs poorly on the reserved set, it is overfit to the training data!
    
    A way to understand how the model performs is by adjusting the hyperparameters, which are higher-level properties of the model. 
    In our case, they would be the value of K, the depth of the decision tree, or the threshold value for the logistical regression
      [img src:"static/images/under_over.png" /]
  [/Step]
[/Scroller]

[Scroller currentStep: knn]

  [Graphic className:"d3-component-container"]
    
      [Conditional if:`knn_k === 3`]
        [img src:"static/images/knn3.png" /]
      [/Conditional]

      [Conditional if:`knn_k === 6`]
        [img src:"static/images/knn6.png" /]
      [/Conditional]

  [/Graphic]

  [Step id:'knn']
    # K Nearest Neighbors (KNN)

    The K Nearest Neighbors (KNN) algorithm [b]classifies data based on data that is most similar[/b].  
    KNN uses similar data for classification by plotting a test point with training data points and classifies
    the test point based on the class of the number (K) of points closest to the test point.
  [/Step]

  [Step]
    In this example the the right, our “test point” is the red point. Click on the buttons to see how different values of k can lead to different classifications.

    [var name:"knn_k" value:3 /]

    [div className:'btn-row']
      [button onClick:`knn_k = 3`]k = 3[/button]
      [button onClick:`knn_k = 6`]k = 6[/button]
    [/div]
    
    [Conditional if:`knn_k === 3`]
    When we set K = 3, we see that there are two points near it that are blue, and one that is black.  
    Since we would [b]classify based on the majority, we would classify our test point as blue[/b].
    [/Conditional]

    [Conditional if:`knn_k === 6`]
    However, when we set K = 6, four of the nearest neighbors are black, while only two are blue. 
    Thus, when K = 6, we would [b]label our test point as black since black now has the majority[/b].
    [/Conditional]

  [/Step]

  [/Scroller]

  [Scroller currentStep: knn2]

  [Step]

    Here, we will apply KNN to our dataset, where you can interact and vary the number of neighbors (K) to see how it will 
    affect what the model predicts.
    
    [Knn /]
    [br/]
    Here, you can play around with the number of neighbors (k) to see what our model would predict 
    for the last mystery person with an age of 30 and 10 years of education.  
    If you remember from above, this person is classified as low income, but if you make K is 2 or less, 
    the KNN algorithm would actually predict high income. This is because the [b]nearest neighbor is classified as high income[/b]. 

    However, having K = 2 presents an interesting problem as the neighbors are classified differently. While there are many approches to break
    a tie, we are [b]using the "nearest" approach, which uses the class with the nearest neighbor[/b]. There is also the "random" approach, which
    selects the class of a random neighbor.
  [/Step]

[/Scroller]

[Scroller currentStep: dt1]
  [Graphic className:"d3-component-container"]
    [img src:"static/images/dt.png" /]
  [/Graphic]

  [Step id:'decisionTrees']
    # Decision Trees
    The decision tree is an important machine learning algorithm that is commonly used for classification problems, just this case!
    This algorithm [b]follows a set of if-else conditions to classify data according to the conditions[/b]. An example of a decision tree is
    shown on the right. 
    
    [var name:"showDtTrain" value:`false` /]
    [button onClick:`showDtTrain = !showDtTrain`]
      Click here to learn about how to train decision trees!
    [/button]
    [Conditional if:`showDtTrain === true`]
      On a high level, decision trees work by reducing information entropy as the data moves down the tree and answers a set of questions.

      Information entropy can be thought of as a measurement of uncertainty. Analogously, it can be thought of as how surprised you would 
      be to discover the real label. If all you know is that half of the labels are true and the other half are false, your uncertainty 
      when making predictions would be very high. On the other extreme, if all labels are true then you would not be surprised at all 
      upon discovering the real label. Your uncertainty would be very low and thus entropy would be 0. 

      At each step (also called a “split”), there is a question about the data at each node in the tree. 

      The format of a split is “which data points have a value of less than V for the attribute A”.

      1. Attributes that contain any numerical value within a range are called continuous. Both age and years of education are therefore considered continuous. 
      The important first step in the algorithm is to create discretized values for each attribute. To keep things simple, age is discretized in steps of 5 
      (20, 25,30..) and the years of education are discretized in steps of 2 (6, 8, 10...). Each discretization forms a candidate for a potential split.
      [br/][br/]
      2. You should calculate the entropy of the data at the current node using the formula for Shannon information entropy as shown on the right. 
      This gives an indication of how mixed the dataset is at this level and the goal is to create a split that reduces the entropy of our child nodes by the most.
      [br/][br/]
      3. For each discretization and for each attribute:
              [br/][br/]
            3a. Data points that have a value less than V for attribute A are passed to the left node while data that have a value greater or equal to V for attribute 
            A are passed to the right node.
              [br/][br/]
            3b. Take a weighted average of the entropy of both child nodes where the weightings are equal to the number of data points on the child nodes divided by 
            the number of data points on the parent nodes. This average of child entropies is also called “the conditional entropy”.
              [br/][br/]
            3c. Subtract the parent entropy by this weighted average in order to calculate the information gain.
            Choose the split that results in the highest information gain. To avoid making excessive splits, make sure to stop if the highest information gain is 0.
              [br/][br/]
            3d. Iteratively repeat steps 2 and 3 until you have reached the max depth or until all the labels of the data points belong to the same class (remember 
            that this is when entropy equals 0).
    [/Conditional]

  [/Step]

  [Step]
    [p className:'introSubtitle']What is depth?[/p]
    The [b]depth of decision trees refers to the number of layers that the tree has[/b].  Decision tree depth is a delicate balance, as [b]too much depth could cause overfitting, 
    and too little depth could lead to less accuracy[/b].  To ensure that your tree has appropriate depth, there are two commonly used methods:  
    [br/][br/]
    1. Grow the tree until it overfits, then start cutting back  
    [br/][br/]
    2. Prevent the tree from growing by stopping it just before perfect classification of the training data
    Let's look at how the data performs at different depths below.
  [/Step]
[/Scroller]

[Scroller currentStep: dt2]
    [Step]
    Below is a visualization showing the decision tree run on our census dataset! Use the scroller to change the maximum depth of 
    the decision tree.
    [var name:"depth" value:1 /]

    [Conditional if:`depth === 1`]
      [p className:'introSubtitle']Depth 1[/p]
      [img src:"static/images/depth1.png" style:`{width: '40%', marginLeft: '0'}` /]
    [/Conditional]

    [Conditional if:`depth === 2`]
      [p className:'introSubtitle']Depth 2[/p]
      [img src:"static/images/depth2.png" style:`{width: '65%', marginLeft: '0'}` /]
    [/Conditional]

    [Conditional if:`depth === 3`]
      [p className:'introSubtitle']Depth 3[/p]
      [img src:"static/images/depth3.png" style:`{width: '82%', marginLeft: '0'}` /]
    [/Conditional]

    [Conditional if:`depth === 4`]
      [p className:'introSubtitle']Depth 4[/p]
      [img src:"static/images/depth4.png" /]
    [/Conditional]
    1 [Range value:depth min:1 max:4 step:1 /] 4
  [/Step]

  [Step]
    [BoundariesChart /]
    [br/][br/]
    Now, let’s gain a different perspective on how the decision tree makes predictions. [b]Each line in the graph above represents a split 
    in the decision tree[/b]. This graph shows how the decision tree breaks down the data into smaller and smaller rectangles and makes the 
    prediction based on which rectangle the data point lands in. [br/]

    One interesting note thing to note is that [b]as the max depth increases, the decision tree tries to account for the anomalous data point [/b]
    (the 32 year old with 9 years of education who is a high earner). If we were to increase max depth past 4, the decision tree would wrap
    around the square containing the anomalous data point and predict all future data points within that square as “High Income”. 
    What is this an example of? [br/]

  [var name:"correct_dt" value:`false` /]
  [var name:"show_dt" value:`false` /]
  [div onClick:`show_dt= true`]
    [button className:'evaluate-btn' onClick:`correct_dt = true`]Overfitting[/button]
    [button className:'evaluate-btn' onClick:`correct_dt = false`]Underfitting[/button]
  [/div]
  [Answer correct:correct_dt show:show_dt /]
    
  [/Step]
[/Scroller]

[Scroller currentStep: logreg]
  [Graphic className:"d3-component-container"]
    [Chart equation:` (t) => 1/(1+ Math.exp(-t))` domain:`[-20, 20]` range:`[-0.5, 1.5]`samplePoints:1000 /]
    [Equation]
      f(x)=\frac{1}{1+e^{-x}}
    [/Equation]
  [/Graphic]
  [Step id:'logreg']
    # Logistic Regression

    Contrary to its name, logistic regression is not a regression algorithm, but a classification algorithm. 
    It does so by performing a [b]linear regression based on the attributes, and then uses that regression to 
    calculate the probability of whether or not the data point is in a certain class[/b].
  [/Step]

  [Step]
    Before understanding logistic regression, one must first understand linear regression. Suppose we have a set
    of points that are in tuples (x, y). We can visualize this set of points in a scatterplot, such as one right here:
      [Iframe link: 'https://www.desmos.com/calculator/sgsjnaipwe?embed' /]

    What if we wanted to fit a line that best fits this data? Our data is noisy, so it won’t fit a line perfectly, 
    but one can manually try to do so. 
      [Iframe link: 'https://www.desmos.com/calculator/plfizuubiv?embed' /]

    Alternatively, one can use a method called “ordinary least squares”, or OLS for short. 
    [b]Since a line can be written as a function, any line can be used to create estimates of an output given 
    input values[/b]. Each point has a corresponding residual, which is computed by taking the difference between 
    the actual value and our estimated value. OLS is the method that calculates the slope and intercept of the 
    line that minimizes the sum of the squares of these residual values: [br/][br/]
      [Equation]
        min_{m, b} \sum_{x_i \in points} (y_i - (m\cdot x_i - b))^2
      [/Equation]

    [b]This scary equation pretty much just helps us find the line that best fits our data points.[/b]
    So, for our example, our ordinary least-squares regression line is: 
    [Equation]
      Y=0.5x + 3
    [/Equation]
    [Iframe link: 'https://www.desmos.com/calculator/ja7f5rvr8v?embed' /]
  [/Step]

  [Step]
    Now that we know the basics of linear regression, we can move on to logistic regression! While explaining 
    the theory behind logistic regression is beyond the scope of this article, we will explain in the overall concept of
    how to perform logistic regression. 

    For a given data point, we can find the OLS estimate using our least-squares regression line equation. 
    This [b]OLS output will be the x-value of the datapoint[/b]. The y-value of the datapoint is the label, which, 
    in our original example, is whether or not an individual has a high income ([b]we can set y = 1 if the data point 
    has a high income, and y = 0 otherwise[/b]). For our dataset, we get the regression line of approximately[br/][br/]
    [Equation]
      Y = -16.4 + 0.25*Age + 0.53*YoE,
    [/Equation][br/][br/]
    where YoE is short for the "Years of Education" value. We can then plot the points as such: 
    [Iframe link: ' https://www.desmos.com/calculator/k60y3zlqsc?embed' /]

    [b]Notice how the points with X<0 are usually labeled with 0 (low income), while points with X>0 have a label of 1 (high income)[/b].
    We can see that there is a point with X ~ -3.593 that is a clear outlier. Looking back at our exploratory data analysis, 
    this is the outlier point point of (9, 32) that is labeled as high income despite the fact that all 
    similar points are low income. 

  [/Step]

  [Step]
    Moving beyond the dataset, the next question is naturally [b]how would we figure out what our model would label points we haven't seen before[/b]? 


    To do so, we fit a sigmoid function on our linear regression line. The sigmoid function is defined as: 
      [Equation]
        f(x) =\frac{1}{1+e^{-x}}
      [/Equation] 
    The sigmoid function is plotted on the right.

    [b]The sigmoid function applied to a linear regression model provides the probability that we classify the output 
    as a 1 (having high income) given the linear regression output [/b], or 
    [Equation] 
        P(y=1 | x)
    [/Equation]. The mathematical intuition behind 
    this is also beyond the scope of the article, but these probabilities are then used to classify unknown points. We can [b]set a threshold value,
    and then given a probability of the point being labeled as 1, use the threshold to decide how to classify the data point[/b].  
    [br/]
    [br/]
    Changing the threshold could affect what we label points as. Feel free to use the slider below to see how big our linear regression output
    needs to be to label the data as "high income" given different threshold values. [b]Note that the red line is the threshold probability, and the 
    shaded green area shows the corresponding values of X that would be classified as "1", or having high income[/b].
    
    [var name:"threshold" value:0.5 /]

    0.3 [Range value:threshold min:0.3 max:0.7 step:0.1 /] 0.7

    Value of threshold: [Display value:threshold /].

    [Conditional if:`threshold === 0.3`]
      [Iframe link: ' https://www.desmos.com/calculator/t3yudcpd0x?embed' /]
    [/Conditional]

    [Conditional if:`threshold === 0.4`]
      [Iframe link: ' https://www.desmos.com/calculator/5fralpxz5v?embed' /]
    [/Conditional]

    [Conditional if:`threshold === 0.5`]
      [Iframe link: ' https://www.desmos.com/calculator/mil4gpaykv?embed' /]
    [/Conditional]

    [Conditional if:`threshold === 0.6`]
      [Iframe link: ' https://www.desmos.com/calculator/0zmi5vkvor?embed' /]
    [/Conditional]

    [Conditional if:`threshold === 0.7`]
      [Iframe link: ' https://www.desmos.com/calculator/z9ws9omj9b?embed' /]
    [/Conditional]

  [/Step]
[/Scroller]

[Scroller currentStep: model_selection]
  [Graphic className:"d3-component-container"]
      [img src:"static/images/machine_learning_cartoon.png" /]
  [/Graphic]

  [Step]
    # Model Selection
    Now that we introduced ourselves to the three classification models, [b]how would we decide which model to use[/b]? 

    Note that for the earlier parts of this report, we only ran an algorithm on 32 points of data. However, the dataset these points 
    originally came from actually had 32,561 rows of data! If we run the algorithm on all of the rows (using the best hyperparameters for
    all the models), we would receive the following accuracies:

    [data name:"model_select" source:"model_selection.csv" /]
    [Table data:model_select defaultPageSize:5/]

    [br/] We see that for each algorithm, the [b]training and test accuracies are close to each other, signifying that our models do not exhibit
    overfitting[/b]. In addition, the accuracies are actually pretty close between the different algorithms! This means when looking at accuracy,
    the performance of these models are similar. [b]As a future step, one can look at other measures of performance, such as precision and recall, 
    to determine what the best model is[/b]. Other factors, such as an individual's understanding or interpretability of the model, may also be
    considerations when selecting models in the future. 
    
  [/Step]

[/Scroller]

[Scroller currentStep: conclusion]
  [Graphic className:"d3-component-container"]
      [img src:"static/images/robot_graduated.png" /]
  [/Graphic]

  [Step]
    # Conclusion
    [p className:'introSubtitle']Great Job![/p]

    We hope that you now have a good understanding of the basics of 3 different types of machine learning classification algorithms.  
    As you saw, these algorithms learn from the data input given and use it to classify new observations. 
    These algorithms are used to perform analytical tasks that would take humans hundreds of more hours to perform!  


    Machine learning has applications in many fields like medical, defense, finance, and technology just to name a few.  
    And, classification algorithms are at the heart of many innovations like image recognition, speech recognition, self-driving cars, 
    email spam filtering, and ecommerce product recommendations.  We hope you are inspired to further your knowledge about these topics!

  [/Step]

[/Scroller]