Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster startup #484

Merged
merged 2 commits into from
May 22, 2020
Merged

Faster startup #484

merged 2 commits into from
May 22, 2020

Conversation

PallHaraldsson
Copy link
Contributor

No description provided.

@PallHaraldsson PallHaraldsson changed the title Faser startup Faster startup May 18, 2020
@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented May 18, 2020

Hi, this is the same code as Jeff added to Plots.jl, that I've been adding to other packages, except in your case I use -O0, as it gets me an extra second off.

Without (and even with) the startup was getting bad. Before you merge, be confident it's not a drawback at runtime (I doubt it is, but didn't test).

I only noticed the time really after the new shorter "using Revise" for startup.jl. I see before, I've been benchmarking other code (including startup of julia), and including I guess influence of Revise in startup.jl...

I know how not to, and personally, I'll change startup.jl to "@time using Revise" to remind me it's there. I'm not sure if there's some rule against making using print something out, but you should consider that, or only for the recommendation for startup.jl.

Your package (but not all) also has the second using issue:

$ ~/julia-1.4.0/bin/julia --startup-file=no
julia> @time using Revise
  4.830814 seconds (2.05 M allocations: 112.012 MiB, 0.33% gc time)

julia> @time using Revise
  1.012486 seconds (1.23 M allocations: 58.075 MiB, 3.33% gc time)

julia> @time using Revise
  0.000191 seconds (306 allocations: 16.359 KiB)

I'm a bit mystified why --startup-file=no isn't the default with julia -e "code". At least then Revise not needed, the only thing I use the file for. Can you think of a case where you would want it (the file) included? Should a PR be made at Julia? It's not like you can't ask for it on if you want.

@KristofferC
Copy link
Collaborator

Setting this to zero turns of almost all optimizations which will likely make things quite slow. Using 1 might be OK

(I doubt it is, but didn't test).

Just out of curiosity, what made you doubt it?

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented May 19, 2020

Right, -O0 without optimizations (using interpretation?) and -O1 with few. It's just I've seen code not that slow with either (but I wouldn't suggest either blindly for all code), and I thought this was such code. I mean I'm no expert on it, I noticed there are file-watchers (and no parsing of the files?) and I thought the code I/O bound. A type of code even Python is considered good for, and I'm not sure -O0 is slower than it.

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented May 19, 2020

The latter commit has "@optlevel 1", while I'm testing locally with 0. It does get me 1 sec. off, but not as much as I wanted, or got with Julia CLI-option -O0, as the package has dependencies. I wouldn't dare suggest -O0 for JuliaInterpreter but I'm testing it with -O0 too and it lowers startup time further.

I think each dependency is included for each package separately (e.g. one package could have a different version).

Then I could see a theoretical possibility of a macro at the using level:

@useopt 0 using OrderedCollections

Do you think it would be difficult to implement?

@timholy
Copy link
Owner

timholy commented May 22, 2020

Thanks for the PR! And sorry to be slow, I've been immersed in a pretty deep dive and keeping out of sight for a while. It's all falling into place so I can come up for air.

For the record, I'm testing on a very recent Julia master (with a couple of custom commits, but they shouldn't affect anything) and startup.jl

if VERSION >= v"1.5.0-DEV.282"
    try
        using Revise
    catch e
        @warn(e.msg)
    end
else
    atreplinit() do repl
        try
            @eval using Revise
            @async Revise.wait_steal_repl_backend()
        catch e
            @warn(e.msg)
        end
    end
end

(on 1.5 you can just say using Revise, no more need for atreplinit).

On Revise master = v2.6.7 (which I see never made it to General due to DNS problems over the weekend), I get

tim@diva:~/.julia/dev/Revise$ time julia-master -e "println(Revise)"
Revise

real	0m4.377s
user	0m4.411s
sys	0m0.260s

which indeed is uncomfortably slow. With this PR,

tim@diva:~/.julia/dev/Revise$ time julia-master -e "println(Revise)"
Revise

real	0m3.684s
user	0m3.788s
sys	0m0.188s

which is indeed quite a lot nicer. A branch I'm working on and which should have a PR soon (it depends on JuliaDebug/LoweredCodeUtils.jl#30), I get

tim@diva:~/.julia/dev/Revise$ time julia-master -e "println(Revise)"
Revise

real	0m1.629s
user	0m1.713s
sys	0m0.208s

without this PR, and

tim@diva:~/.julia/dev/Revise$ time julia-master -e "println(Revise)"
Revise

real	0m1.397s
user	0m1.454s
sys	0m0.229s

with it. To be clear, currently I have no idea why that LoweredCodeUtils PR and the rewrite here is so consequential, but of course I'm very happy about it. And of course this PR improves things too, even on the upcoming faster version.

Let's also check the runtime performance. I'm not sure of the best way to do this, but one option is to time sigtests.jl since that doesn't define a lot of new code and just exercises Revise's parsing abilities. On Revise 2.6.7,

tim@diva:~/.julia/dev/Revise/test$ time julia-master sigtest.jl 
Test Summary:       | Pass  Total
:lambda expressions |    1      1
 17.747128 seconds (51.45 M allocations: 1.919 GiB, 2.17% gc time)

real	0m25.426s
user	0m25.424s
sys	0m0.293s

With this PR added,

tim@diva:~/.julia/dev/Revise/test$ time julia-master sigtest.jl 
Test Summary:       | Pass  Total
:lambda expressions |    1      1
 17.342171 seconds (51.47 M allocations: 1.920 GiB, 2.18% gc time)

real	0m24.083s
user	0m24.135s
sys	0m0.240s

so no performance hit. For fun I tried it on @optlevel 0 and got

tim@diva:~/.julia/dev/Revise/test$ time julia-master sigtest.jl 
Test Summary:       | Pass  Total
:lambda expressions |    1      1
 17.939664 seconds (51.46 M allocations: 1.920 GiB, 2.21% gc time)

real	0m25.088s
user	0m25.063s
sys	0m0.320s

which suggests even this should be safe (if my test is reasonable).

On my upcoming branch, I get

tim@diva:~/.julia/dev/Revise/test$ time julia-master sigtest.jl 
Test Summary:       | Pass  Total
:lambda expressions |    1      1
 14.656378 seconds (39.63 M allocations: 1.566 GiB, 2.95% gc time)

real	0m23.407s
user	0m23.437s
sys	0m0.264s

without this PR, and

tim@diva:~/.julia/dev/Revise/test$ time julia-master sigtest.jl 
Test Summary:       | Pass  Total
:lambda expressions |    1      1
 14.081439 seconds (39.64 M allocations: 1.566 GiB, 2.71% gc time)

real	0m22.380s
user	0m22.337s
sys	0m0.328s

with it.

Bottom line: this is a good change! I'm going to merge it. I'll register 2.6.7 as-is (without this PR), as it will be a useful benchmark for what I anticipate will be Revise 2.7.0 (which will include this and my PR).

@timholy timholy merged commit 953b638 into timholy:master May 22, 2020
@PallHaraldsson PallHaraldsson deleted the patch-1 branch May 23, 2020 15:27
@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented May 23, 2020

FYI:

same startup file with @time using Revise, only (do you do something different depending on what Julia version is running?).

consistent (twice in a row):

$ julia  # 32-bit! default Julia snap, i.e. 1.0.4
  2.858108 seconds (2.34 M allocations: 86.827 MiB, 4.02% gc time)

vs.

$ ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia 
 20.284627 seconds (2.05 M allocations: 115.133 MiB, 0.09% gc time)

$ ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia 
  6.880124 seconds (1.35 M allocations: 80.246 MiB, 0.21% gc time)

$ ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia 
  7.009159 seconds (1.35 M allocations: 80.246 MiB, 0.18% gc time)

then:

(@v1.6) pkg> add Revise#master  # should I doing for dependencies too, and/or rather pkg> update ?

[Immediately after, expected precompile text (and slowness) but didn't see.]
$ julia
 14.749793 seconds (2.88 M allocations: 105.839 MiB, 1.20% gc time)

$ julia
  2.862851 seconds (2.34 M allocations: 87.120 MiB, 3.77% gc time)



$ ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia 
 11.517190 seconds (2.02 M allocations: 113.748 MiB, 0.15% gc time)

$ ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia 
  5.883417 seconds (1.36 M allocations: 80.496 MiB, 0.21% gc time)

$ ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia 
  5.811331 seconds (1.36 M allocations: 80.496 MiB, 0.20% gc time)

$ ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia 
  5.835695 seconds (1.36 M allocations: 80.496 MiB, 0.21% gc time)

My 16-core, 128 GB, machine is somewhat (with firefox excl. sub-processes) taking 17% of memory.

top - 15:44:59 up 32 days, 18:43, 1 user, load average: 11,00, 10,56, 8,86
[..]
KiB Mem : 13182060+total, 26831344 free, 75168496 used, 29820760 buff/cache
KiB Swap: 2097148 total, 2095404 free, 1744 used. 51985916 avail Mem
23569 pharald+ 20 0 22,417g 0,017t 1,392g R 29,7 13,8 3551:21 firefox
23888 pharald+ 20 0 6243564 3,167g 319832 S 41,9 2,5 2795:28 Web Content
23692 pharald+ 20 0 8130804 4,713g 404544 S 37,6 3,7 2305:24 Web Content
23853 pharald+ 20 0 6686712 3,599g 550872 S 56,8 2,9 1712:44 Web Content
3436 pharald+ 20 0 4989560 2,705g 1,290g R 5,6 2,2 1357:10 Xorg
23792 pharald+ 20 0 7293292 3,722g 420448 R 21,1 3,0 1194:56 Web Content
23738 pharald+ 20 0 7791072 4,402g 536816 R 19,8 3,5 956:08.81 Web Content
23837 pharald+ 20 0 6554224 3,246g 453968 R 15,5 2,6 752:18.27 Web Content
23813 pharald+ 20 0 7210272 3,860g 620332 R 9,9 3,1 580:43.47 Web Content
23926 pharald+ 20 0 7242284 3,935g 500180 R 10,9 3,1 528:11.58 Web Content
19313 pharald+ 20 0 3272880 524840 223768 S 8,3 0,4 452:33.65 Web Content

sorted by VIRT (julia high, and slack, based on firefox):
1974 pharald+ 20 0 23,355g 442152 149996 S 2,0 0,3 17:08.62 slack
23569 pharald+ 20 0 22,429g 0,017t 1,364g S 18,1 13,8 3552:46 firefox
6063 pharald+ 20 0 9321600 6,986g 99044 S 0,0 5,6 1:55.18 julia

@JeffBezanson
Copy link
Contributor

FYI, optlevel only affects LLVM-level optimizations. We still do type inference, inlining, and our own optimizations; no interpreter at -O0. So it's generally "safe" to use -O1 and -O0 and indeed for a lot of code it's just a free latency improvement with no ill effects.

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented May 23, 2020

A. Is there a way to control/lower/eliminate other optimizations (without PackageCompiler)? I only know of inlining options, global and otherwise. I see --inline=no makes startup slow, not having to consider it should make compilation faster. Probably startup is slower because the resulting code is run once.

Maybe aggressively adding @noinline to all functions except those known to be inlined would help? I mean a compiler step that outputs updated source code, or even functions with inline LLVM code (similar but not same as PackageCompiler, as would still be portable "source" code; if this is what precompiling more or less does already, I'm a bit puzzled why we do not get much better startup time).

B. FYI: I would not rule out using -O0 as I see a small yet predictable gain, for (dependencies of) Revise at least:

$ hyperfine '~/julia-1.6.0-DEV-8f512f3f6d/bin/julia -O0 --startup-file=no -e "using Revise"'
Benchmark #1: ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia -O0 --startup-file=no -e "using Revise"
  Time (mean ± σ):      3.393 s ±  0.031 s    [User: 3.729 s, System: 0.541 s]
  Range (min … max):    3.331 s …  3.453 s    10 runs
 
$ hyperfine '~/julia-1.6.0-DEV-8f512f3f6d/bin/julia -O1 --startup-file=no -e "using Revise"'
Benchmark #1: ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia -O1 --startup-file=no -e "using Revise"
  Time (mean ± σ):      3.447 s ±  0.047 s    [User: 3.772 s, System: 0.543 s]
  Range (min … max):    3.402 s …  3.551 s    10 runs
 
$ hyperfine '~/julia-1.6.0-DEV-8f512f3f6d/bin/julia -O3 --startup-file=no -e "using Revise"'
Benchmark #1: ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia -O3 --startup-file=no -e "using Revise"
  Time (mean ± σ):      5.487 s ±  0.024 s    [User: 5.796 s, System: 0.571 s]
  Range (min … max):    5.446 s …  5.526 s    10 runs

those timings are for the latest Revise master, and I made sure to turn off firefox and some other programs. But Revise itself has a fixed lowered opt. (i.e. this is only for the dependencies of it; I didn't update) so I tried going to version 2.6.6 and then I get a bit more gap.

@JeffBezanson
Copy link
Contributor

JeffBezanson commented May 23, 2020

Is there a way to control/lower/eliminate other optimizations (without PackageCompiler)?

Not really; there is --compile=min which will try to run everything in the interpreter, but that is usually not practical. The other (non-LLVM) optimizations tend to pay for themselves. One thing that helps is turning down the precision of inference, but due to return_type and @inferred that can break people's code/tests, and it would be a bit unsettling to have an option that changes the language that much.

Maybe aggressively adding @noinline to all functions except those known to be inlined would help?

I doubt it; testing for inlineability doesn't take much time.

functions with inline LLVM code (similar but not same as PackageCompiler, as would still be portable "source" code; if this is what precompiling more or less does already

We don't yet save LLVM code; it would be nice to be able to do that. LLVM isn't source code though, e.g. it needs to be invalidated depending on what else has been loaded, might contain architecture-specific optimizations, etc. Also a huge amount of the time is in LLVM's native instruction selection, so for meaningful gains we probably need to save native code.

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented May 23, 2020

I had totally forgotten about the (undocumented) --compile=min and I'm sure more people would want to know/be reminded of over 5x speedup (and over 3x faster than otherwise current optimized). Why is it undocumented again? I mean just under 1 sec. isn't great, just compared to the alternative, and it would be great to have this option as selective too.

$ hyperfine '~/julia-1.6.0-DEV-8f512f3f6d/bin/julia -O0 --compile=min --inline=yes --startup-file=no  -e "using Revise"'
Benchmark #1: ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia -O0 --compile=min --inline=yes --startup-file=no  -e "using Revise"
  Time (mean ± σ):     985.8 ms ±  31.6 ms    [User: 1.369 s, System: 0.509 s]
  Range (min … max):   937.4 ms … 1022.1 ms    10 runs

$ hyperfine '~/julia-1.6.0-DEV-8f512f3f6d/bin/julia --startup-file=no -e "using Revise"'
Benchmark #1: ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia --startup-file=no -e "using Revise"
  Time (mean ± σ):      5.414 s ±  0.053 s    [User: 5.712 s, System: 0.582 s]
  Range (min … max):    5.327 s …  5.511 s    10 runs

@timholy
Copy link
Owner

timholy commented May 23, 2020

Revise's startup, and latency to first revision, is something I worry about a lot and am trying to address. I have quite a few local customizations that help (latest startup time for me is now 1.25s), but it's still not great:

tim@diva:~/.julia/dev/Revise$ time julia-master -e 'includet("/home/tim/tmp/revise_data_code.jl")'

real	0m5.470s
user	0m5.500s
sys	0m0.264s

with all my best tweaks, and

tim@diva:~/.julia/dev/Revise$ time julia -e 'using Revise; includet("/home/tim/tmp/revise_data_code.jl")'

real	0m6.924s
user	0m7.014s
sys	0m0.204s

on Julia 1.4 and tagged versions of packages. It's a less-impressive difference than I could have hoped but certainly still something to celebrate.
(revise_data_code.jl is a irrelevant 4-line script I'm using just for convenience because includet triggers a lot of Revise's analysis machinery without forcing package recompilation the way that touching "real package" files does.)

I'd declare victory if I could get this down to about 1s, and we're a looong way from that. Without PackageCompiler I don't currently see a path to achieve that, unless (1) we can start caching LLVM or native code (and now that Revise is on the threshold of being completely non-invalidating, that's eliminated a major barrier), or (2) we start moving pieces of JuliaInterpreter/LoweredCodeUtils/Revise into Julia itself. To do that in a sensible way, (a) we'd probably need to merge JuliaInterpreter and interpreter.c into a single entity, or at least with a lot of shared infrastructure, and (b) add a few things to Revise and Base that will make the combination robust enough to deserve that kind of treatment.

If you had to pick (1) or (2), which seems more sensible? Obviously both are speculative and each is a big lift on its own. I don't know how to help with (1), but if it's not really on the table I think pretty soon we'll be in a position to contemplate (2). Revise still has some serious issues, but JuliaDebug/LoweredCodeUtils.jl#30 will likely be a game-changer. Exploiting it to its fullest will require a "contract" change for Revise that it will modify only methods and perhaps literal assignments like flag = false, and otherwise never touch your data, but personally I think that's where it needs to go anyway (it will probably allow me to soon fix #249, by far the deepest bug in Revise, and a whole host of closely related problems like making includet actually work). Other than that, just about the only other things I think it needs to be good at its job 😄 are (1) fix timholy/CodeTracking.jl#55 (easy), and (2) an ability to perform its revisions in order of module insertion (probably not quite so easy, but seems unlikely to be as hard as #249). Once those things are done, I think that aside from ironing out the inevitable stupid bugs, Revise will actually do what it should do and nothing more. It still won't handle type-redefinition or things like macros and generated functions for which we do not track backedges, but limitations are not the same thing as bugs---bugs are reasons for avoidance, limitations are just aspirations for the future.

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented May 25, 2020

FYI: I did alias juli="~/julia-1.6.0-DEV-8f512f3f6d/bin/julia --compile=min -O1"

for julia-interactive, I didn't want to alias julia, in case I forget... for benchmarking reasons, and changed startup to be as yours plus @time.

It makes startup tolerable (I do it often), and I would want this --compile=min as an option in Juno (and I guess VSCode, should switch; and if I used Jupyther):

$ juli
  0.840181 seconds (748.74 k allocations: 48.758 MiB, 1.51% gc time)

$ ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia
  5.387995 seconds (1.38 M allocations: 81.845 MiB)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants