-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathalgo_earlyexit.html
297 lines (208 loc) · 14.5 KB
/
algo_earlyexit.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="shortcut icon" href="img/favicon.ico">
<title>Early Exit - Neural Network Distiller</title>
<link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="css/theme.css" type="text/css" />
<link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
<link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
<link href="extra.css" rel="stylesheet">
<script>
// Current page data
var mkdocs_page_name = "Early Exit";
var mkdocs_page_input_path = "algo_earlyexit.md";
var mkdocs_page_url = null;
</script>
<script src="js/jquery-2.1.1.min.js" defer></script>
<script src="js/modernizr-2.8.3.min.js" defer></script>
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
<div class="wy-side-nav-search">
<a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
<div role="search">
<form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
<input type="text" name="q" placeholder="Search docs" title="Type search term here" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1">
<a class="" href="index.html">Home</a>
</li>
<li class="toctree-l1">
<a class="" href="install.html">Installation</a>
</li>
<li class="toctree-l1">
<a class="" href="usage.html">Usage</a>
</li>
<li class="toctree-l1">
<a class="" href="schedule.html">Compression Scheduling</a>
</li>
<li class="toctree-l1">
<span class="caption-text">Compressing Models</span>
<ul class="subnav">
<li class="">
<a class="" href="pruning.html">Pruning</a>
</li>
<li class="">
<a class="" href="regularization.html">Regularization</a>
</li>
<li class="">
<a class="" href="quantization.html">Quantization</a>
</li>
<li class="">
<a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
</li>
<li class="">
<a class="" href="conditional_computation.html">Conditional Computation</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Algorithms</span>
<ul class="subnav">
<li class="">
<a class="" href="algo_pruning.html">Pruning</a>
</li>
<li class="">
<a class="" href="algo_quantization.html">Quantization</a>
</li>
<li class=" current">
<a class="current" href="algo_earlyexit.html">Early Exit</a>
<ul class="subnav">
<li class="toctree-l3"><a href="#early-exit-inference">Early Exit Inference</a></li>
<ul>
<li><a class="toctree-l4" href="#why-does-early-exit-work">Why Does Early Exit Work?</a></li>
<li><a class="toctree-l4" href="#example-code-for-early-exit">Example code for Early Exit</a></li>
<li><a class="toctree-l4" href="#references">References</a></li>
</ul>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1">
<a class="" href="model_zoo.html">Model Zoo</a>
</li>
<li class="toctree-l1">
<a class="" href="jupyter.html">Jupyter Notebooks</a>
</li>
<li class="toctree-l1">
<a class="" href="design.html">Design</a>
</li>
<li class="toctree-l1">
<span class="caption-text">Tutorials</span>
<ul class="subnav">
<li class="">
<a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
</li>
<li class="">
<a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
</li>
</ul>
</li>
</ul>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">Neural Network Distiller</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html">Docs</a> »</li>
<li>Algorithms »</li>
<li>Early Exit</li>
<li class="wy-breadcrumbs-aside">
</li>
</ul>
<hr/>
</div>
<div role="main">
<div class="section">
<h1 id="early-exit-inference">Early Exit Inference</h1>
<p>While Deep Neural Networks benefit from a large number of layers, it's often the case that many data points in classification tasks can be classified accurately with much less work. There have been several studies recently regarding the idea of exiting before the normal endpoint of the neural network. Panda et al in <a href="#panda">Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition</a> points out that a lot of data points can be classified easily and require less processing than some more difficult points and they view this in terms of power savings. Surat et al in <a href="#branchynet">BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks</a> look at a selective approach to exit placement and criteria for exiting early.</p>
<h2 id="why-does-early-exit-work">Why Does Early Exit Work?</h2>
<p>Early Exit is a strategy with a straightforward and easy to understand concept Figure #fig(boundaries) shows a simple example in a 2-D feature space. While deep networks can represent more complex and expressive boundaries between classes (assuming we’re confident of avoiding over-fitting the data), it’s also clear that much of the data can be properly classified with even the simplest of classification boundaries.</p>
<p><img alt="Figure !fig(boundaries): Simple and more expressive classification boundaries" src="imgs/decision_boundary.png" /></p>
<p>Data points far from the boundary can be considered "easy to classify" and achieve a high degree of confidence quicker than do data points close to the boundary. In fact, we can think of the area between the outer straight lines as being the region that is "difficult to classify" and require the full expressiveness of the neural network to accurately classify it.</p>
<h2 id="example-code-for-early-exit">Example code for Early Exit</h2>
<p>Both CIFAR10 and ImageNet code comes directly from publicly available examples from PyTorch. The only edits are the exits that are inserted in a methodology similar to BranchyNet work.</p>
<p><strong>Note:</strong> the sample code provided for ResNet models with Early Exits has exactly one early exit for the CIFAR10 example and exactly two early exits for the ImageNet example. If you want to modify the number of early exits, you will need to make sure that the model code is updated to have a corresponding number of exits.
Deeper networks can benefit from multiple exits. Our examples illustrate both a single and a pair of early exits for CIFAR10 and ImageNet, respectively.</p>
<p>Note that this code does not actually take exits. What it does is to compute statistics of loss and accuracy assuming exits were taken when criteria are met. Actually implementing exits can be tricky and architecture dependent and we plan to address these issues.</p>
<h3 id="example-command-lines">Example command lines</h3>
<p>We have provided examples for ResNets of varying sizes for both CIFAR10 and ImageNet datasets. An example command line for training for CIFAR10 is:</p>
<pre><code class="bash">python compress_classifier.py --arch=resnet32_cifar_earlyexit --epochs=20 -b 128 \
--lr=0.003 --earlyexit_thresholds 0.4 --earlyexit_lossweights 0.4 -j 30 \
--out-dir /home/ -n earlyexit /home/pcifar10
</code></pre>
<p>And an example command line for ImageNet is:</p>
<pre><code class="bash">python compress_classifier.py --arch=resnet50_earlyexit --epochs=120 -b 128 \
--lr=0.003 --earlyexit_thresholds 1.2 0.9 --earlyexit_lossweights 0.1 0.3 \
-j 30 --out-dir /home/ -n earlyexit /home/I1K/i1k-extracted/
</code></pre>
<h3 id="heuristics">Heuristics</h3>
<p>The insertion of the exits are ad-hoc, but there are some heuristic principals guiding their placement and parameters. The earlier exits are placed, the more aggressive the exit as it essentially prunes the rest of the network at a very early stage, thus saving a lot of work. However, a diminishing percentage of data will be directed through the exit if we are to preserve accuracy.</p>
<p>There are other benefits to adding exits in that training the modified network now has back-propagation losses coming from the exits that affect the earlier layers more substantially than the last exit. This effect mitigates problems such as vanishing gradient.</p>
<h3 id="early-exit-hyper-parameters">Early Exit Hyper-Parameters</h3>
<p>There are two parameters that are required to enable early exit. Leave them undefined if you are not enabling Early Exit:</p>
<ol>
<li><strong>--earlyexit_thresholds</strong> defines the thresholds for each of the early exits. The cross entropy measure must be <strong>less than</strong> the specified threshold to take a specific exit, otherwise the data continues along the regular path. For example, you could specify "--earlyexit_thresholds 0.9 1.2" and this implies two early exits with corresponding thresholds of 0.9 and 1.2, respectively to take those exits.</li>
</ol>
<p>12 <strong>--earlyexit_lossweights</strong> provide the weights for the linear combination of losses during training to compute a single, overall loss. We only specify weights for the early exits and assume that the sum of the weights (including final exit) are equal to 1.0. So an example of "--earlyexit_lossweights 0.2 0.3" implies two early exits weighted with values of 0.2 and 0.3, respectively and that the final exit has a value of 1.0-(0.2+0.3) = 0.5. Studies have shown that weighting the early exits more heavily will create more agressive early exits, but perhaps with a slight negative effect on accuracy.</p>
<h3 id="output-stats">Output Stats</h3>
<p>The example code outputs various statistics regarding the loss and accuracy at each of the exits. During training, the Top1 and Top5 stats represent the accuracy should all of the data be forced out that exit (in order to compute the loss at that exit). During inference (i.e. validation and test stages), the Top1 and Top5 stats represent the accuracy for those data points that could exit because the calculated entropy at that exit was lower than the specified threshold for that exit.</p>
<h3 id="cifar10">CIFAR10</h3>
<p>In the case of CIFAR10, we have inserted a single exit after the first full layer grouping. The layers on the exit path itself includes a convolutional layer and a fully connected layer. If you move the exit, be sure to match the proper sizes for inputs and outputs to the exit layers.</p>
<h3 id="imagenet">ImageNet</h3>
<p>This supports training and inference of the ImageNet dataset via several well known deep architectures. ResNet-50 is the architecture of interest in this study, however the exit is defined in the generic ResNet code and could be used with other size ResNets. There are two exits inserted in this example. Again, exit layers must have their sizes match properly.</p>
<h2 id="references">References</h2>
<div id="panda"></div>
<p><strong>Priyadarshini Panda, Abhronil Sengupta, Kaushik Roy</strong>.
<a href="https://arxiv.org/abs/1509.08971v6"><em>Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition</em></a>, arXiv:1509.08971v6, 2017.</p>
<div id="branchynet"></div>
<p><strong>Surat Teerapittayanon, Bradley McDanel, H. T. Kung</strong>.
<a href="http://arxiv.org/abs/1709.01686"><em>BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks</em></a>, arXiv:1709.01686, 2017.</p>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="model_zoo.html" class="btn btn-neutral float-right" title="Model Zoo">Next <span class="icon icon-circle-arrow-right"></span></a>
<a href="algo_quantization.html" class="btn btn-neutral" title="Quantization"><span class="icon icon-circle-arrow-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<!-- Copyright etc -->
</div>
Built with <a href="http://www.mkdocs.org">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<div class="rst-versions" role="note" style="cursor: pointer">
<span class="rst-current-version" data-toggle="rst-current-version">
<span><a href="algo_quantization.html" style="color: #fcfcfc;">« Previous</a></span>
<span style="margin-left: 15px"><a href="model_zoo.html" style="color: #fcfcfc">Next »</a></span>
</span>
</div>
<script>var base_url = '.';</script>
<script src="js/theme.js" defer></script>
<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
<script src="search/main.js" defer></script>
</body>
</html>