-
Notifications
You must be signed in to change notification settings - Fork 189
Fix missing SSE detection on x64 targets. Fixes #25 #26
base: master
Are you sure you want to change the base?
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
Re the CLA: I'm a Google employee. I just registered my GitHub account, though, so approval may still be processing. |
CLAs look good, thanks! |
To be precise: It takes twice as long to run for x64, and we don't know how much slower it was before, since on x64 it wasn't using SIMD at all, right? This change seems otherwise ok to me, it is not like anything in this PR is causing a slowdown by itself, so I think it should go in. Finding out the slowdown is a separate issue. |
Not quite, no. You're correct that x64 was previously not using SIMD at all, whether it was enabled in the CMAKE options or not. What I meant was that now that x64 builds can use SIMD, enabling it actually slows down the matrix_benchmarks by a factor of two vs. the SIMD-disabled x64 configuration. My concern is that if this PR is merged as-is without identifying the cause of the slowdown, existing mathfu users who think they've had SIMD enabled all this time will suddenly see a pretty significant performance drop after integrating these changes. |
I think the fpu <-> Mem <-> simd conversion is likely the biggest culprit, which can implicitly happen with the current implementation, and be tricky to spot. As for copy constructors and temp objects, the compiler does a pretty good job of dealing with this. https://en.wikipedia.org/wiki/Copy_elision |
Oh, I just noticed you did indicate it was in the benchmark samples, that you saw the slowdown. |
Lenovo ThinkPad P50 laptop. Before applying this PR:
After applying this PR:
|
One important note on this PR: with this change, matrix_benchmarks takes twice as long to run on my test system with SIMD enabled as it does with SIMD disabled. That seems... unintuitive. And worth investigating further.