-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use SIMD.jl directly instead of LV.jl for fast_findmin()
#84
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #84 +/- ##
==========================================
+ Coverage 71.52% 71.91% +0.38%
==========================================
Files 17 17
Lines 1166 1182 +16
==========================================
+ Hits 834 850 +16
Misses 332 332 ☔ View full report in Codecov by Sentry. |
I will test this, but initial indications are that the performance on Apple silicon is rather bad |
hmm, I don't have access to Apple Silicon, but luckily we don't run JetReco on aarch64 in HEP yet |
Yeah, but it's where I develop! And people testing FCC workflows, for example, do use their Mac laptops... |
Yeah, it's quite a significant regression on Apple silicon. On my M2... main ~/.julia/dev/JetReconstruction/examples/ [main*] julia --project instrumented-jetreco.jl --algorithm=AntiKt -R 0.4 ../test/data/events.pp13TeV.hepmc3.gz -m 32
Processed 100 events 32 times
Average time per event 170.6484634375 ± 4.057617987022607 μs
Lowest time per event 166.17292 μs exorcise_lv ~/.julia/dev/JetReconstruction/examples/ [exorcise_lv] julia --project instrumented-jetreco.jl --algorithm=AntiKt -R 0.4 ../test/data/events.pp13TeV.hepmc3.gz -m 32
Processed 100 events 32 times
Average time per event 201.44234250000002 ± 3.8225514883321954 μs
Lowest time per event 197.07167 μs I need to test also on x86 and follow-up with further benchmarks of suggestions which came up in the Discourse thread. |
Benchmarks for x86_64, AMD Ryzen 7 5700G. mainpc-sft-fa-0a :: dev/JetReconstruction/examples ‹main› » julia --project instrumented-jetreco.jl --algorithm=AntiKt -R 0.4 ../test/data/events.pp13TeV.hepmc3.gz -m 32
Processed 100 events 32 times
Average time per event 184.5104475 ± 6.358035705629633 μs
Lowest time per event 176.97703 μs exorcise_lvpc-sft-fa-0a :: dev/JetReconstruction/examples ‹exorcise_lv› » julia --project instrumented-jetreco.jl --algorithm=AntiKt -R 0.4 ../test/data/events.pp13TeV.hepmc3.gz -m 32
Processed 100 events 32 times
Average time per event 184.13992875 ± 6.5172848018939655 μs
Lowest time per event 177.66260999999997 μs So it's really doing a good job on x86. We just have to find a way to make it also work well on Apple silicon. |
do you ever have NaN in this function? because: julia> @fastmath foldl(min, [1.0, NaN, 0.5])
0.5
julia> foldl(min, [1.0, NaN, 0.5])
NaN |
No, there can't be a |
ok yeah if you don't have |
There shouldn't be any zeros at all, so |
honestly, I might need to buy a Mac Mini just so I can test aarch64 performance.... |
They are quite sweet little machines (especially now they have M4s) - I am thinking about getting one myself! |
So I was thinking that if we do not find a generic solution, we can tolerate two ways to implement Though it would be good to understand if it's OS we should switch on or CPU arch - @aoanla would be able to help us run a few tests. |
Poor man's workaround: ssh to a aarch64 MacOS GitHub Runner. ;) |
btw, here's what LV.jl is doing: |
I know have a M4 mac mini to play with, and I can see the difference: (thisPR) [1]> julia --project src/benchmark.jl --backend Julia --repeats 30 data/events-pp-13TeV-20GeV.hepmc3.gz
1×5 DataFrame
Row │ File_path File mean_particles n_samples time_per_event
│ Any String Int64 Int64 Float64
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ data/events-pp-13TeV-20GeV.hepmc… events-pp-13TeV-20GeV.hepmc3.gz -1 16 143.213
(main)> julia --project src/benchmark.jl --backend Julia --repeats 30 data/events-pp-13TeV-20GeV.hepmc3.gz
1×5 DataFrame
Row │ File_path File mean_particles n_samples time_per_event
│ Any String Int64 Int64 Float64
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ data/events-pp-13TeV-20GeV.hepmc… events-pp-13TeV-20GeV.hepmc3.gz -1 16 131.523 |
fix #83