forked from CMU-IDS-2020/fp-classification_clarification
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.idyll
669 lines (533 loc) · 29.4 KB
/
index.idyll
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
[meta title:"Final Project" description:"Short description of your project" /]
[Header
fullWidth:true
title:"Machine Learning Classification"
authors:`[{
name: "Laura Howard",
link: "https://www.linkedin.com/in/laura-m-howard/"
}, {
name: "Christian Deverall",
link: "https://www.linkedin.com/in/christian-deverall-cmu/"
},
{
name: "Nathan Jen",
link: "https://www.linkedin.com/in/nathanjen/"
},
{
name: "Juliette Wong",
link: "https://www.linkedin.com/in/juliettenwong/"
}]`
date:`(new Date()).toDateString()`
background:"#222222"
color:"#ffffff"
/]
[Scroller currentStep: intro]
[Step]
# A Brief History of Classification
[aside]
[img src:"static/images/machine_learning_cartoon.png" /]
[/aside]
[p className:'introSubtitle']Can computers learn from data?[/p]
This question led to the development of machine learning. In 1959, Arthur Samuel (a pioneer computer scientist)
defined machine learning as “the field of study that [b] gives computers the ability to learn without being explicitly programmed[/b]”.
While this may sound complex, we hope to make some machine learning concepts approachable and easy to learn
through examples and activities in this article.
[/Step]
[Step]
[p className:'introSubtitle']Understanding Algorithms[/p]
[aside]
[img src:"static/images/ml_books.png" /]
[/aside]
Think of it this way – as you can learn these concepts through experimenting with examples, computers can do the same!
To grasp how machine learning works, we need to understand [b]algorithms - the method for creating models from data[/b].
In this article, we will show you popular algorithms that perform classification tasks on a simple dataset related to income.
[/Step]
[/Scroller]
[var name:"case" value:0 /]
[Scroller currentStep: case]
[Graphic className:"d3-component-container"]
[img src:"static/images/students.png" /]
[/Graphic]
[Step]
# Case Study
[p className:'introSubtitle']Is Income Predictable?[/p]
You are starting your senior year of high school, and are [b]trying to decide if you will further your education beyond graduation[/b].
You are wondering if you will see high enough returns in the long run that will allow you to live a comfortable life and pay off student loans.
To do this, you look at US census data, and gather real examples of peoples:
[p className:'numbered-list']
1. Age [br/]
2. Number of years of education [br/]
3. What their income is [br/]
[/p]
You classify income as high if it exceeds $50,000 (Data is from the 1994 US census bureau database. $50,000 is worth ~$88,000 in 2020).
Lets take a quick look at the data below:
[data name:"train" source:"train_data.csv" /]
[Table data:train defaultPageSize:8/]
[/Step]
[/Scroller]
[Scroller currentStep: eda]
[Step]
[p className:'introSubtitle']Exploring the dataset[/p]
As you probably realized, [b] looking at raw data isn't very useful.[/b] So now lets look at visualizations to see what
more we can learn from this dataset!
[IdyllVegaLite data:train spec:`{
renderer: 'svg',
mark: "bar",
encoding: {
y: {field: "High Income"},
x: {aggregate: "count", title: "Number of People with High Income"},
color: {field: "High Income", type: "nominal"},
}
}`/]
First, let’s look at a simple bar chart showing the number of people with high income and without.
As you can see here, there is an equal number of people (16) with high and low income in the dataset,
totaling 32 records.
[aside]
Color this chart by:
[var name:"colorBy" value:`{value: "#696969"}` /]
[button onClick:`colorBy = {field: 'High Income', type: 'nominal'}` ]High Income[/button]
[button onClick:`colorBy = {value: "#696969"}` ]None[/button]
[/aside]
[IdyllVegaLite data:train spec:`{
renderer: 'svg',
mark: "bar",
encoding: {
x: {field: "Age", bin: true},
y: {aggregate: "count", title: "Number of People"},
color: colorBy
}
}` /]
Next, let’s look at the spread of peoples’ ages. When the high income button is selected to the right, you can see how old
people who have a high income are.
[Conditional if:`colorBy.field`]
As you can see, it seems as if [b]older people are more likely to be high income[/b]. This makes
sense as more experience usually means a better salary!
[/Conditional]
[aside]
Color this chart by:
[var name:"colorBy2" value:`{value: "#696969"}` /]
[button onClick:`colorBy2 = {field: 'High Income', type: 'nominal'}` ]High Income[/button]
[button onClick:`colorBy2 = {value: "#696969"}` ]None[/button]
[/aside]
[IdyllVegaLite data:train spec:`{
renderer: 'svg',
mark: "bar",
encoding: {
x: {field: "Years of Education", axis: {labelAngle: 0}},
y: {aggregate: "count", title: "Number of People"},
color: colorBy2
}
}` /]
This graph shows the amount of people by years of education. When the high income button is selected, you
can see how many years of education people with a high income have.
[Conditional if:`colorBy2.field`]
As you can see from this graph, those with [b]high income tend to be the ones with more years of education[/b].
However, [b]this is not always the case[/b], as you will see some low income individuals
with 9, 10, 11, and 13 years of experience.
[/Conditional]
[IdyllVegaLite data:train spec:`{
renderer: 'svg',
mark: "circle",
encoding: {
x: {field: "Age", type: "quantitative"},
y: {field: "Years of Education", type: "quantitative"},
color: {field: "High Income", type: "nominal"},
tooltip: {field: "High Income", type: "nominal"}
}
}` /]
Finally, let’s look at the age and education variables together in a scatterplot.
After looking at the data, do you think more years of schooling truly result in a higher income for you?
[/Step]
[/Scroller]
[Scroller currentStep: ml]
[Graphic className:"d3-component-container"]
[img src:"static/images/mystery-person.png" /]
[/Graphic]
[Step]
[p className:'introSubtitle']Evaluating Your Model[/p]
After exploring the data, we hope that you got a sense of which patterns are more likely
to mean that someone is high income or not. By doing this, [b]you were actually training your own model[/b], which is
almost exactly what machine learning models do! However, now that your model is trained, lets put it to the test
and have you predict the income of 3 mystery people.
[div className:'person-section']
[b]Mystery Person #1:[/b] [br/]
Age: 57 [br/]
Years of Education: 15
[var name:"correct1" value:`false` /]
[var name:"show1" value:`false` /]
[div className:'btn-row' onClick:`show1 = true`]
[button className:'evaluate-btn' onClick:`correct1 = true`]High Income[/button]
[button className:'evaluate-btn' onClick:`correct1 = false`]Low Income[/button]
[/div]
[Answer correct:correct1 show:show1]
This mystery person is both on the older side and well educated. As you can see in the graph below, s/he is surrounded
by high income individuals, so there is a high likelihood that this person follows suit. [br/]
[data name:"mystery_one" source:"mystery1.csv" /]
[IdyllVegaLite data:mystery_one spec:`{
renderer: 'svg',
mark: "circle",
encoding: {
x: {field: "Age", type: "quantitative"},
y: {field: "Years of Education", type: "quantitative"},
color: {field: "High Income", type: "nominal", scale: { domain: ["High", "Low", "Test"], range: ["#ff7f0f", "#1e77b4", "#000000"]}}
}
}` /]
[/Answer]
[/div]
[div className:'person-section']
[b]Mystery Person #2:[/b] [br/]
Age: 22 [br/]
Years of Education: 7
[var name:"correct2" value:`false` /]
[var name:"show2" value:`false` /]
[div className:'btn-row' onClick:`show2 = true`]
[button onClick:`correct2 = false`]High Income[/button]
[button onClick:`correct2 = true`]Low Income[/button]
[/div]
[Answer correct:correct2 show:show2]
This mystery person is both on the younger side and has a less total years of education. As you can see in the graph below,
s/he is surrounded by low income individuals, so there is a high likelihood that this person follows suit.
[data name:"mystery_two" source:"mystery2.csv" /]
[IdyllVegaLite data:mystery_two spec:`{
renderer: 'svg',
mark: "circle",
encoding: {
x: {field: "Age", type: "quantitative"},
y: {field: "Years of Education", type: "quantitative"},
color: {field: "High Income", type: "nominal", scale: { domain: ["High", "Low", "Test"], range: ["#ff7f0f", "#1e77b4", "#000000"]}}
}
}` /]
[/Answer]
[/div]
[div className:'person-section']
[b]Mystery Person #3:[/b] [br/]
Age: 30 [br/]
Years of Education: 10
[var name:"correct3" value:`false` /]
[var name:"show3" value:`false` /]
[div className:'btn-row' onClick:`show3 = true`]
[button className:'evaluate-btn' onClick:`correct3 = false`]High Income[/button]
[button className:'evaluate-btn' onClick:`correct3 = true`]Low Income[/button]
[/div]
[Answer correct:correct3 show:show3]
This mystery person is middle aged, and has a medium number of years of education.
This was a tough question, as this person's age and education years are not clearly on one side of the spectrum of indicators of income.
Sometimes machine learning can lead to a wrong classification as patterns are not always clear-cut.
[data name:"mystery_three" source:"mystery3.csv" /]
[IdyllVegaLite data:mystery_three spec:`{
renderer: 'svg',
mark: "circle",
encoding: {
x: {field: "Age", type: "quantitative"},
y: {field: "Years of Education", type: "quantitative"},
color: {field: "High Income", type: "nominal", scale: { domain: ["High", "Low", "Test"], range: ["#ff7f0f", "#1e77b4", "#000000"]}}
}
}` /]
[/Answer]
[/div]
[/Step]
[/Scroller]
[Scroller currentStep: ml]
[Graphic className:"d3-component-container"]
[img src:"static/images/training.png" /]
[/Graphic]
[Step]
# Using Machine Learning
We will now complete the same classifying task with three commonly used classification algorithms:
[a href:"#knn"]1. K Nearest Neighbors (KNN)[/a]
[a href:"#decisionTrees"]2. Decision Trees[/a]
[a href:"#logreg"]3. Logistic Regression[/a]
[/Step]
[Step]
[p className:'introSubtitle']Training[/p]
Before we get into the algorithms, we need to understand why models need training.
Training a model simply means teaching the model good values for a set of examples.
Each example will help the model to understand the relationship between the variables
(in this case, age and education years) and the classification value (income).
Once the relationship is understood, the model will be able to predict an unknown classification value for new sets of variables.
[/Step]
[Step]
[img src:"static/images/warning3.gif" /]
[p className:'introSubtitle']Warning: Overfitting[/p]
One thing to pay attention to when using machine learning algorithms is overfitting.
Overfitting happens when a model is too complex and starts to classify according
to a random error in the data over the expected relationships between variables.
A model is considered [b]“overfit” when it fits your training data really well, yet performs
poorly on new data[/b].
One way to identify an “overfit” is to reserve a portion of your data
set and introduce it after you are finished creating your model to see how it performs. If you model
performs poorly on the reserved set, it is overfit to the training data!
A way to understand how the model performs is by adjusting the hyperparameters, which are higher-level properties of the model.
In our case, they would be the value of K, the depth of the decision tree, or the threshold value for the logistical regression
[img src:"static/images/under_over.png" /]
[/Step]
[/Scroller]
[Scroller currentStep: knn]
[Graphic className:"d3-component-container"]
[Conditional if:`knn_k === 3`]
[img src:"static/images/knn3.png" /]
[/Conditional]
[Conditional if:`knn_k === 6`]
[img src:"static/images/knn6.png" /]
[/Conditional]
[/Graphic]
[Step id:'knn']
# K Nearest Neighbors (KNN)
The K Nearest Neighbors (KNN) algorithm [b]classifies data based on data that is most similar[/b].
KNN uses similar data for classification by plotting a test point with training data points and classifies
the test point based on the class of the number (K) of points closest to the test point.
[/Step]
[Step]
In this example the the right, our “test point” is the red point. Click on the buttons to see how different values of k can lead to different classifications.
[var name:"knn_k" value:3 /]
[div className:'btn-row']
[button onClick:`knn_k = 3`]k = 3[/button]
[button onClick:`knn_k = 6`]k = 6[/button]
[/div]
[Conditional if:`knn_k === 3`]
When we set K = 3, we see that there are two points near it that are blue, and one that is black.
Since we would [b]classify based on the majority, we would classify our test point as blue[/b].
[/Conditional]
[Conditional if:`knn_k === 6`]
However, when we set K = 6, four of the nearest neighbors are black, while only two are blue.
Thus, when K = 6, we would [b]label our test point as black since black now has the majority[/b].
[/Conditional]
[/Step]
[/Scroller]
[Scroller currentStep: knn2]
[Step]
Here, we will apply KNN to our dataset, where you can interact and vary the number of neighbors (K) to see how it will
affect what the model predicts.
[Knn /]
[br/]
Here, you can play around with the number of neighbors (k) to see what our model would predict
for the last mystery person with an age of 30 and 10 years of education.
If you remember from above, this person is classified as low income, but if you make K is 2 or less,
the KNN algorithm would actually predict high income. This is because the [b]nearest neighbor is classified as high income[/b].
However, having K = 2 presents an interesting problem as the neighbors are classified differently. While there are many approches to break
a tie, we are [b]using the "nearest" approach, which uses the class with the nearest neighbor[/b]. There is also the "random" approach, which
selects the class of a random neighbor.
[/Step]
[/Scroller]
[Scroller currentStep: dt1]
[Graphic className:"d3-component-container"]
[img src:"static/images/dt.png" /]
[/Graphic]
[Step id:'decisionTrees']
# Decision Trees
The decision tree is an important machine learning algorithm that is commonly used for classification problems, just this case!
This algorithm [b]follows a set of if-else conditions to classify data according to the conditions[/b]. An example of a decision tree is
shown on the right.
[var name:"showDtTrain" value:`false` /]
[button onClick:`showDtTrain = !showDtTrain`]
Click here to learn about how to train decision trees!
[/button]
[Conditional if:`showDtTrain === true`]
On a high level, decision trees work by reducing information entropy as the data moves down the tree and answers a set of questions.
Information entropy can be thought of as a measurement of uncertainty. Analogously, it can be thought of as how surprised you would
be to discover the real label. If all you know is that half of the labels are true and the other half are false, your uncertainty
when making predictions would be very high. On the other extreme, if all labels are true then you would not be surprised at all
upon discovering the real label. Your uncertainty would be very low and thus entropy would be 0.
At each step (also called a “split”), there is a question about the data at each node in the tree.
The format of a split is “which data points have a value of less than V for the attribute A”.
1. Attributes that contain any numerical value within a range are called continuous. Both age and years of education are therefore considered continuous.
The important first step in the algorithm is to create discretized values for each attribute. To keep things simple, age is discretized in steps of 5
(20, 25,30..) and the years of education are discretized in steps of 2 (6, 8, 10...). Each discretization forms a candidate for a potential split.
[br/][br/]
2. You should calculate the entropy of the data at the current node using the formula for Shannon information entropy as shown on the right.
This gives an indication of how mixed the dataset is at this level and the goal is to create a split that reduces the entropy of our child nodes by the most.
[br/][br/]
3. For each discretization and for each attribute:
[br/][br/]
3a. Data points that have a value less than V for attribute A are passed to the left node while data that have a value greater or equal to V for attribute
A are passed to the right node.
[br/][br/]
3b. Take a weighted average of the entropy of both child nodes where the weightings are equal to the number of data points on the child nodes divided by
the number of data points on the parent nodes. This average of child entropies is also called “the conditional entropy”.
[br/][br/]
3c. Subtract the parent entropy by this weighted average in order to calculate the information gain.
Choose the split that results in the highest information gain. To avoid making excessive splits, make sure to stop if the highest information gain is 0.
[br/][br/]
3d. Iteratively repeat steps 2 and 3 until you have reached the max depth or until all the labels of the data points belong to the same class (remember
that this is when entropy equals 0).
[/Conditional]
[/Step]
[Step]
[p className:'introSubtitle']What is depth?[/p]
The [b]depth of decision trees refers to the number of layers that the tree has[/b]. Decision tree depth is a delicate balance, as [b]too much depth could cause overfitting,
and too little depth could lead to less accuracy[/b]. To ensure that your tree has appropriate depth, there are two commonly used methods:
[br/][br/]
1. Grow the tree until it overfits, then start cutting back
[br/][br/]
2. Prevent the tree from growing by stopping it just before perfect classification of the training data
Let's look at how the data performs at different depths below.
[/Step]
[/Scroller]
[Scroller currentStep: dt2]
[Step]
Below is a visualization showing the decision tree run on our census dataset! Use the scroller to change the maximum depth of
the decision tree.
[var name:"depth" value:1 /]
[Conditional if:`depth === 1`]
[p className:'introSubtitle']Depth 1[/p]
[img src:"static/images/depth1.png" style:`{width: '40%', marginLeft: '0'}` /]
[/Conditional]
[Conditional if:`depth === 2`]
[p className:'introSubtitle']Depth 2[/p]
[img src:"static/images/depth2.png" style:`{width: '65%', marginLeft: '0'}` /]
[/Conditional]
[Conditional if:`depth === 3`]
[p className:'introSubtitle']Depth 3[/p]
[img src:"static/images/depth3.png" style:`{width: '82%', marginLeft: '0'}` /]
[/Conditional]
[Conditional if:`depth === 4`]
[p className:'introSubtitle']Depth 4[/p]
[img src:"static/images/depth4.png" /]
[/Conditional]
1 [Range value:depth min:1 max:4 step:1 /] 4
[/Step]
[Step]
[BoundariesChart /]
[br/][br/]
Now, let’s gain a different perspective on how the decision tree makes predictions. [b]Each line in the graph above represents a split
in the decision tree[/b]. This graph shows how the decision tree breaks down the data into smaller and smaller rectangles and makes the
prediction based on which rectangle the data point lands in. [br/]
One interesting note thing to note is that [b]as the max depth increases, the decision tree tries to account for the anomalous data point [/b]
(the 32 year old with 9 years of education who is a high earner). If we were to increase max depth past 4, the decision tree would wrap
around the square containing the anomalous data point and predict all future data points within that square as “High Income”.
What is this an example of? [br/]
[var name:"correct_dt" value:`false` /]
[var name:"show_dt" value:`false` /]
[div onClick:`show_dt= true`]
[button className:'evaluate-btn' onClick:`correct_dt = true`]Overfitting[/button]
[button className:'evaluate-btn' onClick:`correct_dt = false`]Underfitting[/button]
[/div]
[Answer correct:correct_dt show:show_dt /]
[/Step]
[/Scroller]
[Scroller currentStep: logreg]
[Graphic className:"d3-component-container"]
[Chart equation:` (t) => 1/(1+ Math.exp(-t))` domain:`[-20, 20]` range:`[-0.5, 1.5]`samplePoints:1000 /]
[Equation]
f(x)=\frac{1}{1+e^{-x}}
[/Equation]
[/Graphic]
[Step id:'logreg']
# Logistic Regression
Contrary to its name, logistic regression is not a regression algorithm, but a classification algorithm.
It does so by performing a [b]linear regression based on the attributes, and then uses that regression to
calculate the probability of whether or not the data point is in a certain class[/b].
[/Step]
[Step]
Before understanding logistic regression, one must first understand linear regression. Suppose we have a set
of points that are in tuples (x, y). We can visualize this set of points in a scatterplot, such as one right here:
[Iframe link: 'https://www.desmos.com/calculator/sgsjnaipwe?embed' /]
What if we wanted to fit a line that best fits this data? Our data is noisy, so it won’t fit a line perfectly,
but one can manually try to do so.
[Iframe link: 'https://www.desmos.com/calculator/plfizuubiv?embed' /]
Alternatively, one can use a method called “ordinary least squares”, or OLS for short.
[b]Since a line can be written as a function, any line can be used to create estimates of an output given
input values[/b]. Each point has a corresponding residual, which is computed by taking the difference between
the actual value and our estimated value. OLS is the method that calculates the slope and intercept of the
line that minimizes the sum of the squares of these residual values: [br/][br/]
[Equation]
min_{m, b} \sum_{x_i \in points} (y_i - (m\cdot x_i - b))^2
[/Equation]
[b]This scary equation pretty much just helps us find the line that best fits our data points.[/b]
So, for our example, our ordinary least-squares regression line is:
[Equation]
Y=0.5x + 3
[/Equation]
[Iframe link: 'https://www.desmos.com/calculator/ja7f5rvr8v?embed' /]
[/Step]
[Step]
Now that we know the basics of linear regression, we can move on to logistic regression! While explaining
the theory behind logistic regression is beyond the scope of this article, we will explain in the overall concept of
how to perform logistic regression.
For a given data point, we can find the OLS estimate using our least-squares regression line equation.
This [b]OLS output will be the x-value of the datapoint[/b]. The y-value of the datapoint is the label, which,
in our original example, is whether or not an individual has a high income ([b]we can set y = 1 if the data point
has a high income, and y = 0 otherwise[/b]). For our dataset, we get the regression line of approximately[br/][br/]
[Equation]
Y = -16.4 + 0.25*Age + 0.53*YoE,
[/Equation][br/][br/]
where YoE is short for the "Years of Education" value. We can then plot the points as such:
[Iframe link: ' https://www.desmos.com/calculator/k60y3zlqsc?embed' /]
[b]Notice how the points with X<0 are usually labeled with 0 (low income), while points with X>0 have a label of 1 (high income)[/b].
We can see that there is a point with X ~ -3.593 that is a clear outlier. Looking back at our exploratory data analysis,
this is the outlier point point of (9, 32) that is labeled as high income despite the fact that all
similar points are low income.
[/Step]
[Step]
Moving beyond the dataset, the next question is naturally [b]how would we figure out what our model would label points we haven't seen before[/b]?
To do so, we fit a sigmoid function on our linear regression line. The sigmoid function is defined as:
[Equation]
f(x) =\frac{1}{1+e^{-x}}
[/Equation]
The sigmoid function is plotted on the right.
[b]The sigmoid function applied to a linear regression model provides the probability that we classify the output
as a 1 (having high income) given the linear regression output [/b], or
[Equation]
P(y=1 | x)
[/Equation]. The mathematical intuition behind
this is also beyond the scope of the article, but these probabilities are then used to classify unknown points. We can [b]set a threshold value,
and then given a probability of the point being labeled as 1, use the threshold to decide how to classify the data point[/b].
[br/]
[br/]
Changing the threshold could affect what we label points as. Feel free to use the slider below to see how big our linear regression output
needs to be to label the data as "high income" given different threshold values. [b]Note that the red line is the threshold probability, and the
shaded green area shows the corresponding values of X that would be classified as "1", or having high income[/b].
[var name:"threshold" value:0.5 /]
0.3 [Range value:threshold min:0.3 max:0.7 step:0.1 /] 0.7
Value of threshold: [Display value:threshold /].
[Conditional if:`threshold === 0.3`]
[Iframe link: ' https://www.desmos.com/calculator/t3yudcpd0x?embed' /]
[/Conditional]
[Conditional if:`threshold === 0.4`]
[Iframe link: ' https://www.desmos.com/calculator/5fralpxz5v?embed' /]
[/Conditional]
[Conditional if:`threshold === 0.5`]
[Iframe link: ' https://www.desmos.com/calculator/mil4gpaykv?embed' /]
[/Conditional]
[Conditional if:`threshold === 0.6`]
[Iframe link: ' https://www.desmos.com/calculator/0zmi5vkvor?embed' /]
[/Conditional]
[Conditional if:`threshold === 0.7`]
[Iframe link: ' https://www.desmos.com/calculator/z9ws9omj9b?embed' /]
[/Conditional]
[/Step]
[/Scroller]
[Scroller currentStep: model_selection]
[Graphic className:"d3-component-container"]
[img src:"static/images/machine_learning_cartoon.png" /]
[/Graphic]
[Step]
# Model Selection
Now that we introduced ourselves to the three classification models, [b]how would we decide which model to use[/b]?
Note that for the earlier parts of this report, we only ran an algorithm on 32 points of data. However, the dataset these points
originally came from actually had 32,561 rows of data! If we run the algorithm on all of the rows (using the best hyperparameters for
all the models), we would receive the following accuracies:
[data name:"model_select" source:"model_selection.csv" /]
[Table data:model_select defaultPageSize:5/]
[br/] We see that for each algorithm, the [b]training and test accuracies are close to each other, signifying that our models do not exhibit
overfitting[/b]. In addition, the accuracies are actually pretty close between the different algorithms! This means when looking at accuracy,
the performance of these models are similar. [b]As a future step, one can look at other measures of performance, such as precision and recall,
to determine what the best model is[/b]. Other factors, such as an individual's understanding or interpretability of the model, may also be
considerations when selecting models in the future.
[/Step]
[/Scroller]
[Scroller currentStep: conclusion]
[Graphic className:"d3-component-container"]
[img src:"static/images/robot_graduated.png" /]
[/Graphic]
[Step]
# Conclusion
[p className:'introSubtitle']Great Job![/p]
We hope that you now have a good understanding of the basics of 3 different types of machine learning classification algorithms.
As you saw, these algorithms learn from the data input given and use it to classify new observations.
These algorithms are used to perform analytical tasks that would take humans hundreds of more hours to perform!
Machine learning has applications in many fields like medical, defense, finance, and technology just to name a few.
And, classification algorithms are at the heart of many innovations like image recognition, speech recognition, self-driving cars,
email spam filtering, and ecommerce product recommendations. We hope you are inspired to further your knowledge about these topics!
[/Step]
[/Scroller]