-
Notifications
You must be signed in to change notification settings - Fork 11
/
Copy path020 Serial Performance.jl
245 lines (185 loc) · 6.7 KB
/
020 Serial Performance.jl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
import Pkg; Pkg.activate(@__DIR__); Pkg.instantiate()
# # Fast (serial) programming with Julia
#
# Yes, this is a parallel computing course — but to write efficient parallel
# programs we first must learn how to write fast serial Julia code. This is
# a rapid primer in high performance (serial) programming.
#
# I _highly_ recommend reviewing the [Performance Tips](https://docs.julialang.org/en/v1.1/manual/performance-tips/)
# in the manual. This is only going to briefly introduce some of the main concepts.
#-
# ## Measure, measure, measure.
#
# It is very easy to experiment in Julia; you can rapidly try many options and
# see what is the fastest.
#-
# Use the [BenchmarkTools](https://github.com/JuliaCI/BenchmarkTools.jl) package:
using BenchmarkTools
"""
findclosest(data, point)
A simple example that returns the element in `data` that is closest to `point`
"""
function findclosest(data, point)
_, index = findmin(abs.(data .- point))
return data[index]
end
data = rand(5000)
findclosest(data, 0.5)
#-
@time findclosest(data, 0.5)
#-
@benchmark findclosest($data, $0.5)
# ### Profile!
using Profile
Profile.clear()
@profile for _ in 1:100000; findclosest(data, 0.5); end
Profile.print(maxdepth=11)
# ### Iterate!
#
# Before we had:
# ```julia
# function findclosest(data, point)
# _, index = findmin(abs.(data .- point))
# return data[index]
# end
# ```
#
# Let's come up with a new definition that can combine the two operations:
function findclosest2(data, point)
bestval = first(data)
bestdist = abs(bestval - point)
for elt in data
dist = abs(elt - point)
if dist < bestdist
bestval = elt
bestdist = dist
end
end
return bestval
end
## And do a spot-check to make sure we did the optimization correctly:
findclosest2(data, 0.5) == findclosest(data, .5)
#-
@benchmark findclosest2($data, $0.5)
# ## A quick word on macros
#
# Macros are those funny things starting with `@`. They can reinterpret what
# you write and do something different — essentially introducing a new keyword.
#
# For example, the `@assert` macro simply takes an expression and throws an
# exception if it returns `false`.
@assert 2+2 == 4
# It does this by literally re-writing what you wrote. You can see it in action
# with `@macroexpand`
@macroexpand @assert 2+2 == 4
# Each macro can define its own special syntax, and this is used extensively for
# code introspection, serial performance improvements, and — perhaps most
# importantly — parallelization perimitives!
#-
# ## How is Julia fast?
#
# By understanding the basics of how Julia _can_ be fast, you can get a better
# sense for how to write fast Julia code.
#
# Perhaps most importantly, Julia can reason about types. Recall: this is the definition of `findclosest2`:
#
# ```julia
# function findclosest2(data, point)
# bestval = first(data)
# bestdist = abs(bestval - point)
# for elt in data
# dist = abs(elt - point)
# if dist < bestdist
# bestval = elt
# bestdist = dist
# end
# end
# return bestval
# end
# ```
@code_typed optimize=false findclosest2(data, 0.5)
#-
typeof(data)
#-
newdata = Real[data...]
typeof(newdata)
#-
@code_typed optimize=false findclosest2(newdata, 0.5)
#-
@benchmark findclosest2(newdata, 0.5)
#-
@code_warntype findclosest2(newdata, 0.5)
# ### Type stability
#
# A function is called type-stable if Julia is able to infer what the output
# type will be based purely on the types of the inputs.
#
# Things that thwart type stability:
# * Running things in global scope: create functions instead!
# * Non-concretely typed containers
# * Structs with abstractly-typed fields
# * Non-constant globals (they might change!)
# * Functions that change what they return based on the _values_:
#-
# #### More on macros
#
# Each and every macro can define its own syntax. The `@benchmark` macro uses `$` in a special way.
# The goal behind `@benchmark` is to evaluate the performance of a code snippet
# as though it were written in a function. Use `$` to flag what will be an argument
# or local variable in the function. Forgetting to use `$`s may result in faster
# or slower timings than real-world performance.
x = 0.5 # non-constant global
@btime sin(x)
@btime sin($x)
#-
@btime sin(0.5) # constant literal!
@btime sin($0.5)
# ### Specializations
#
# Julia's reasoning about types is particularly important since it generates
# specialized machine code specifically for the given arguments.
@code_llvm 1 + 2
# This applies just the same to any functions we write — even the more complicated ones:
@code_llvm findclosest2(Float32[2.2,3.4,4.5],Float32(3.2))
# This applies just the same to any functions we write — even the more complicated ones:
remove_comments(s) = join(filter(x->!startswith(x, ";"), split(s, "\n")), "\n")
sprint(code_llvm, findclosest2, Tuple{Vector{Float32}, Int}) |> remove_comments |> print
# ## Modern hardware effects
#
# There are lots of little performance quirks in modern computers; I'll just
# cover two interesting ones here:
@benchmark findclosest2($data, $0.5)
#-
sorteddata = sort(data)
@benchmark findclosest2($sorteddata, $0.5)
# Unfortunately, this isn't demonstrable on a hardened cloud platform... because
# it's a huge security risk!
#
# * https://meltdownattack.com
# * https://discourse.julialang.org/t/psa-microbenchmarks-remember-branch-history/17436
idxs = sortperm(data)
sortedview = @view data[idxs]
@benchmark findclosest2($sortedview, $0.5)
# ### Memory latencies
#
# | System Event | Actual Latency | Scaled Latency |
# | ------------------------------ | -------------- | -------------- |
# | One CPU cycle | 0.4 ns | 1 s |
# | Level 1 cache access | 0.9 ns | 2 s |
# | Level 2 cache access | 2.8 ns | 7 s |
# | Level 3 cache access | 28 ns | 1 min |
# | Main memory access (DDR DIMM) | ~100 ns | 4 min |
# | Intel Optane memory access | <10 μs | 7 hrs |
# | NVMe SSD I/O | ~25 μs | 17 hrs |
# | SSD I/O | 50–150 μs | 1.5–4 days |
# | Rotational disk I/O | 1–10 ms | 1–9 months |
# | Internet call: SF to NYC | 65 ms | 5 years |
# | Internet call: SF to Hong Kong | 141 ms | 11 years |
#
# (from https://www.prowesscorp.com/computer-latency-at-a-human-scale/)
#-
# # Key Takeaways
#
# * Measure, measure, measure!
# * Get familiar with the [Performance Tips](https://docs.julialang.org/en/v1/manual/performance-tips/)
# * Don't be scared of `@code_typed`/`@code_warntype` and `@code_llvm`