Tell people to stop using global models #1085

oxinabox · 2020-03-14T18:11:01Z

People keep doing this and it is bad.
Having it in the performance tips should at at least help.

But throughout the docs we violate this rule nearly constantly.
My bad example was basically copied straight from https://fluxml.ai/Flux.jl/stable/training/training/#Loss-Functions-1

But this PR is just going to add it to the performance tips.
Its beyond scope of this PR to rewrite every example that violates this

johnnychen94 · 2020-03-14T20:50:16Z

docs/src/performance.md

+```
+
+#### Make the loss function actually close over `m`.
+Closures can be very useful.


Suggested change

Closures can be very useful.

[Closures](https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-captured-1) can be very useful.

CarloLucibello · 2020-03-15T07:42:28Z

docs/src/performance.md

+Similarly anything else that is a non-constant global that is used in functions should also be made constant
+
+#### Put everything in a main function:
+For more flexibility, you could even make this take `m` as a argument -- it doesn't matter of `m` was originally declared as a non-const global once it has been passed in as a argument because it then becomes a local variable.


Suggested change

For more flexibility, you could even make this take `m` as a argument -- it doesn't matter of `m` was originally declared as a non-const global once it has been passed in as a argument because it then becomes a local variable.

For more flexibility, you could even make this take `m` as a argument -- it doesn't matter if `m` was originally declared as a non-const global once it has been passed in as a argument because it then becomes a local variable.

DhairyaLGandhi · 2020-03-15T07:45:47Z

Do we link to https://docs.julialang.org/en/v1/manual/performance-tips/ appropriately too?

bhvieira · 2020-03-27T13:48:06Z

docs/src/performance.md

+
+Flux.train!(loss, Flux.params(m), data, Descent(0.1))
+```
+Similarly anything else that is a non-constant global that is used in functions should also be made constant


Perhaps explicitly call it data? Or inputs and outputs. That way most bases are covered, I can't imagine other things being used in forward passes other than models and data.

bhvieira · 2020-03-27T13:51:23Z

I'll be frank here, I didn't know declaring the models constant helped, because the parameters are type-stable. I declare models inside functions, so I'm not hit by this when training, but I've been running some post-hoc analysis on the trained models and this should benefit me when loading trained models into memory, right? I'll try to benchmark this later.

bhvieira · 2020-03-27T14:48:42Z

using Flux
using BenchmarkTools

m = Chain(Dense(784, 32, σ), Dense(32, 10), softmax);
const const_m = Chain(Dense(784, 32, σ), Dense(32, 10), softmax);
loss(x, y) = Flux.mse(m(x), y);
const_loss(x, y) = Flux.mse(const_m(x), y);
x, y = (randn(Float32, 784, 100), softmax(randn(Float32, 10, 100)));
const const_x, const_y = (randn(Float32, 784, 100), softmax(randn(Float32, 10, 100)));

#tests without `$` interpolation, I guess it makes sense since we want to measure the effect of globals
@benchmark loss(x, y)

BenchmarkTools.Trial:
memory estimate: 46.88 KiB
allocs estimate: 20

minimum time: 228.200 μs (0.00% GC)
median time: 249.301 μs (0.00% GC)
mean time: 279.954 μs (1.77% GC)
maximum time: 5.604 ms (94.28% GC)

samples: 10000
evals/sample: 1

@benchmark loss(const_x, const_y)

BenchmarkTools.Trial:
memory estimate: 46.88 KiB
allocs estimate: 20

minimum time: 227.500 μs (0.00% GC)
median time: 242.500 μs (0.00% GC)
mean time: 266.806 μs (1.45% GC)
maximum time: 4.452 ms (94.59% GC)

samples: 10000
evals/sample: 1

@benchmark const_loss(x, y)

BenchmarkTools.Trial:
memory estimate: 46.88 KiB
allocs estimate: 20

minimum time: 228.600 μs (0.00% GC)
median time: 241.200 μs (0.00% GC)
mean time: 272.915 μs (1.30% GC)
maximum time: 3.870 ms (92.49% GC)

samples: 10000
evals/sample: 1

@benchmark const_loss(const_x, const_y)

BenchmarkTools.Trial:
memory estimate: 46.86 KiB
allocs estimate: 19

minimum time: 230.500 μs (0.00% GC)
median time: 242.700 μs (0.00% GC)
mean time: 280.045 μs (0.89% GC)
maximum time: 3.766 ms (0.00% GC)

samples: 10000
evals/sample: 1

@benchmark Flux.train!(loss, Flux.params(m), [(x,y)], Descent(0.1))

BenchmarkTools.Trial:
memory estimate: 744.63 KiB
allocs estimate: 10388

minimum time: 1.133 ms (0.00% GC)
median time: 1.239 ms (0.00% GC)
mean time: 1.414 ms (4.25% GC)
maximum time: 7.640 ms (0.00% GC)

samples: 3516
evals/sample: 1

@benchmark Flux.train!(const_loss, Flux.params(const_m), [(const_x,const_y)], Descent(0.1))

BenchmarkTools.Trial:
memory estimate: 744.59 KiB
allocs estimate: 10387

minimum time: 1.134 ms (0.00% GC)
median time: 1.233 ms (0.00% GC)
mean time: 1.437 ms (4.32% GC)
maximum time: 6.885 ms (0.00% GC)

samples: 3458
evals/sample: 1

jondeuce · 2020-06-28T19:26:34Z

Echoing @bhvieira's thoughts, I'm not seeing much of a performance hit in practice, with the exception of very small models. Consider the following examples:

m_large = Flux.Chain(Flux.Dense(768,1024), Flux.Dense(1024,10), Flux.softmax)
m_small = Flux.Chain(Flux.Dense(1,1), Flux.Dense(1,1), Flux.softmax)
const const_m_large = deepcopy(m_large)
const const_m_small = deepcopy(m_small)

loss_large(x,y) = Flux.mse(m_large(x), y)
loss_small(x,y) = Flux.mse(m_small(x), y)
grad_large(x,y) = Flux.gradient(() -> loss_large(x,y), Flux.params(m_large))
grad_small(x,y) = Flux.gradient(() -> loss_small(x,y), Flux.params(m_small))
loss_const_large(x,y) = Flux.mse(const_m_large(x), y)
loss_const_small(x,y) = Flux.mse(const_m_small(x), y)
grad_const_large(x,y) = Flux.gradient(() -> loss_const_large(x,y), Flux.params(const_m_large))
grad_const_small(x,y) = Flux.gradient(() -> loss_const_small(x,y), Flux.params(const_m_small))

makexy(in, out, batch) = rand(Float32, in, batch), rand(Float32, out, batch)

@btime loss_large(x,y)       setup = ((x,y) = makexy(768, 10, 1024)); # 6.491 ms (19 allocations: 8.20 MiB)
@btime loss_const_large(x,y) setup = ((x,y) = makexy(768, 10, 1024)); # 6.515 ms (18 allocations: 8.20 MiB)

@btime grad_large(x,y)       setup = ((x,y) = makexy(768, 10, 1024)); # 18.681 ms (161 allocations: 18.54 MiB)
@btime grad_const_large(x,y) setup = ((x,y) = makexy(768, 10, 1024)); # 18.459 ms (157 allocations: 18.54 MiB)

@btime loss_small(x,y)       setup = ((x,y) = makexy(1, 1, 1)); # 1.059 μs (12 allocations: 944 bytes)
@btime loss_const_small(x,y) setup = ((x,y) = makexy(1, 1, 1)); # 1.021 μs (11 allocations: 928 bytes)

@btime grad_small(x,y)       setup = ((x,y) = makexy(1, 1, 1)); # 15.880 μs (133 allocations: 5.81 KiB)
@btime grad_const_small(x,y) setup = ((x,y) = makexy(1, 1, 1)); # 12.370 μs (129 allocations: 5.72 KiB) <--- Performance hit

Interestingly, the ~3 μs performance hit in the gradient of the small model is fairly robust to changes in model size.

To be honest, I'm not exactly sure of the point I'm trying to make; of course, declaring models as const is strictly better than not, and reinforcing this in the documentation (especially considering it is good general Julia practice) is a good idea. Perhaps, however, it can be noted that for large computations the use of global models is not likely to seriously denigrate performance, due to internal model parameters themselves being type stable etc.

bhvieira · 2020-06-28T19:42:58Z

@jondeuce Yeah, I think that's the takehome message here. The microsecond difference might be due to that few allocations we get without const

CarloLucibello · 2020-07-02T16:55:21Z

given the above benchmarks, the tone here is too alarmistic and should be moderated

Tell people sto stop using global models

f87257f

johnnychen94 reviewed Mar 14, 2020

View reviewed changes

CarloLucibello reviewed Mar 15, 2020

View reviewed changes

jamblejoe mentioned this pull request Mar 26, 2020

How about we don't use actively terrible julia code as a common demo? FluxML/model-zoo#203

Open

bhvieira reviewed Mar 27, 2020

View reviewed changes

ToucheSir mentioned this pull request Jul 21, 2020

Flux docs FluxML/FluxML-Community-Call-Minutes#8

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tell people to stop using global models #1085

Tell people to stop using global models #1085

oxinabox commented Mar 14, 2020

johnnychen94 Mar 14, 2020

CarloLucibello Mar 15, 2020

DhairyaLGandhi commented Mar 15, 2020

bhvieira Mar 27, 2020

bhvieira commented Mar 27, 2020

bhvieira commented Mar 27, 2020 •

edited

Loading

jondeuce commented Jun 28, 2020 •

edited

Loading

bhvieira commented Jun 28, 2020

CarloLucibello commented Jul 2, 2020

	Closures can be very useful.
	[Closures](https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-captured-1) can be very useful.

	For more flexibility, you could even make this take `m` as a argument -- it doesn't matter of `m` was originally declared as a non-const global once it has been passed in as a argument because it then becomes a local variable.
	For more flexibility, you could even make this take `m` as a argument -- it doesn't matter if `m` was originally declared as a non-const global once it has been passed in as a argument because it then becomes a local variable.

Tell people to stop using global models #1085

Are you sure you want to change the base?

Tell people to stop using global models #1085

Conversation

oxinabox commented Mar 14, 2020

johnnychen94 Mar 14, 2020

Choose a reason for hiding this comment

CarloLucibello Mar 15, 2020

Choose a reason for hiding this comment

DhairyaLGandhi commented Mar 15, 2020

bhvieira Mar 27, 2020

Choose a reason for hiding this comment

bhvieira commented Mar 27, 2020

bhvieira commented Mar 27, 2020 • edited Loading

jondeuce commented Jun 28, 2020 • edited Loading

bhvieira commented Jun 28, 2020

CarloLucibello commented Jul 2, 2020

bhvieira commented Mar 27, 2020 •

edited

Loading

jondeuce commented Jun 28, 2020 •

edited

Loading