Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building 2.0.1 with AVX2 fails #277

Closed
smuzaffar opened this issue Sep 22, 2020 · 17 comments
Closed

Building 2.0.1 with AVX2 fails #277

smuzaffar opened this issue Sep 22, 2020 · 17 comments

Comments

@smuzaffar
Copy link

Hi,
I am trying to build mkFit 2.0.1 using GCC 8.4.0 on CentOS7 with avx2 flag but it fails to compile [a]. Is AVX2 not supported?

Thanks, --Shahzad

[a]

make -j 10 TBB_PREFIX=/data/cmsbld/vector/del/slc7_amd64_gcc820/external/tbb/2020_U2-ghbfee VEC_GCC=-mavx2
....
g++ -I..   -I. -DUSE_MATRIPLEX -DMPLEX_USE_INTRINSICS -DCHECKSTATEVALID      -I/data/cmsbld/vector/del/slc7_amd64_gcc820/external/tbb/2020_U2-ghbfee/include -DTBB -DNO_ROOT -std=c++1z -fPIC -g -O3  -std=c++1z -ftree-vectorize -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -Wstrict-overflow -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wunused -Wparentheses -Wno-deprecated -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Wstrict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -fdiagnostics-color=always -fdiagnostics-show-option -pthread -pipe -fopenmp -mavx2 -c -o ConformalUtilsMPlex.o ConformalUtilsMPlex.cc
In file included from /data/cmsbld/vector/del/slc7_amd64_gcc820/external/gcc/8.2.0-bcolbf/lib/gcc/x86_64-unknown-linux-gnu/8.4.0/include/immintrin.h:97,
                 from ../Matriplex/MatriplexCommon.h:15,
                 from ../Matriplex/MatriplexSym.h:4,
                 from ../Matrix.h:114,
                 from ConformalUtilsMPlex.h:4,
                 from ConformalUtilsMPlex.cc:1:
/data/cmsbld/vector/del/slc7_amd64_gcc820/external/gcc/8.2.0-bcolbf/lib/gcc/x86_64-unknown-linux-gnu/8.4.0/include/fmaintrin.h: In function 'void mkfit::CFMap(const MPlexHH&, const MPlexHV&, mkfit::MPlexHV&)':
/data/cmsbld/vector/del/slc7_amd64_gcc820/external/gcc/8.2.0-bcolbf/lib/gcc/x86_64-unknown-linux-gnu/8.4.0/include/fmaintrin.h:63:1: error: inlining failed in call to always_inline '__m256 _mm256_fmadd_ps(__m256, __m256, __m256)': target specific option mismatch
 _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
 ^~~~~~~~~~~~~~~
In file included from ../Matriplex/MatriplexSym.h:4,
                 from ../Matrix.h:114,
                 from ConformalUtilsMPlex.h:4,
                 from ConformalUtilsMPlex.cc:1:
../Matriplex/MatriplexCommon.h:53:42: note: called from here
     #define FMA(a, b, v)  _mm256_fmadd_ps(a, b, v)
                           ~~~~~~~~~~~~~~~^~~~~~~~~
CFMatrix33Vector3.ah:36:13: note: in expansion of macro 'FMA'
       c_2 = FMA(a_8, b_2, c_2);
             ^~~

@makortel
Copy link
Collaborator

@smuzaffar IIRC the AVX2 should be enabled with

make ... AVX2=1

Does that work? (it essentially adds -mfma in addition to -mavx2)

@smuzaffar
Copy link
Author

passing AVX2=1 or uncommenting it in Makefile.config only pass -mavx2 to the compiler and it still failed. Using -mavx2 -mfma also did not work

@srlantz
Copy link
Collaborator

srlantz commented Sep 22, 2020

@smuzaffar What version of the mkFit code are you using? A couple of fixes to AVX2 went into PR #262, which was merged on 3/4/20. One of these fixes was to add -mfma when AVX2=1 is defined during make; this is what makes me suspect that you have an older version. But another fix that went into PR #262 took care of a problem with one of the avx2 intrinsics.

@slava77
Copy link
Collaborator

slava77 commented Sep 22, 2020

@smuzaffar What version of the mkFit code are you using?

the issue title and description says "2.0.1", IIUC, this is as of #241

@srlantz
Copy link
Collaborator

srlantz commented Sep 22, 2020

Thanks @slava77.

@smuzaffar, I wonder if this issue is resolved by substituting MatriplexCommon.h from HEAD of devel (which has the necessary changes compared to commit d9dfbda), and adding -mfma to the AVX2 options in Makefile.config?

@smuzaffar
Copy link
Author

should I try buiilding the head of devel branch?

@dan131riley
Copy link
Collaborator

Adding -mfma fixes the error you cited, but then it still fails with

../Matriplex/MatriplexCommon.h:47:56: error: there are no arguments to '_mm256_load_epi32' that depend on a template parameter, so a declaration of '_mm256_load_epi32' must be available [-fpermissive]
     #define GATHER_IDX_LOAD(name, arr)  __m256i name = _mm256_load_epi32(arr);
                                                        ^~~~~~~~~~~~~~~~~

which is fixed by the change to Matriplex/MatriplexCommon.h in #262, so basically you need #262, or as Steve said -mfma + Matriplex/MatriplexCommon.h from the HEAD of devel. We should make a new release sometime soon.

@smuzaffar
Copy link
Author

smuzaffar commented Sep 22, 2020

OK, head of devel branch allowed me to build with avx2, avx, avx512 .
@slava77 , any reason not to update mkfit to tip of devel branch in cmssw?

@slava77
Copy link
Collaborator

slava77 commented Sep 22, 2020

@slava77 , any reason not to update mkfit to tip of devel branch in cmssw?

I thought that it had to have an update in the cms-sw/cmssw code , but maybe I'm wrong.
@makortel please clarify if post-2.0.1 some updates in the cmssw source would be required.

@makortel
Copy link
Collaborator

please clarify if post-2.0.1 some updates in the cmssw source would be required.

(I had to remind myself) Yes, there are (because of #243).

I'd also really like to have #257 resolved for the next tagged version.

@VinInn
Copy link

VinInn commented Oct 7, 2020

This is the curse of "standard" AVX2. It is a very "minimal" instruction set.

I strongly advice to only use specific architectures namely
for "sse4.2" use -march=nehalem
for "avx2/fma" use -march=haswell
for "avx512" use -march=skylake-avx512
Those will enable the WHOLE instruction set.
please note that (at least for gcc) the latter does not imply avx512 vector instructions as the default is -mprefer-vector-width=256
to enable full avx512 use -march=skylake-avx512 -mprefer-vector-width=512
in general the latter is way slower than the former (at least on silver and gold)

note that nehalem, haswell works for amd as well.
on recent Rome default is ok, on previous "zen" -mprefer-vector-width=128 is most probably better (but haswell will work as well)

@slava77
Copy link
Collaborator

slava77 commented Oct 7, 2020

it sounds like it's important to allow overriding the flags defined in mkFit Makefile with an environment variable to avoid patching trackreco/mkFit for any possible architecture build.
Having this would then let users pick and choose as needed.

@dan131riley
Copy link
Collaborator

@slava77 what do you think we need that isn't provided by make VEC_GCC=-mavx2?

@slava77
Copy link
Collaborator

slava77 commented Oct 7, 2020

@slava77 what do you think we need that isn't provided by make VEC_GCC=-mavx2?

I'd not ask for more.
I guess this would then cover #277 (comment)
I somehow misunderstood that the comment was about internal setup in mkFit.

IIUC, what remains for the current release used in CMSSW is a change in the code (https://github.com/trackreco/mkFit/pull/262/files#diff-ff197123457dabd037284f15a683d9b9L39-L48) because enabling of some of the flags would not compile .

@srlantz
Copy link
Collaborator

srlantz commented Oct 7, 2020

@dan131riley @slava77 To fix Shahzad's issue in a manner consistent with PR #262, these two pieces should be sufficient: (1) make VEC_GCC="-mavx2 -mfma" and (2) the changes to MatriplexCommon.h at the link Slava sent.

Ultimately, we may want to follow Vincenzo's advice and use -march=haswell for AVX2, for both VEC_GCC and VEC_ICC. It may be a better choice because it would bring in -mavx2 -mfma and probably some other flags that are very basic to the AVX2 instruction set (which would be good for all of Intel's AVX2 architectures, e.g.). But for sure -mfma should accompany -mavx2; the pair is present in the current Makefile.config.

@srlantz
Copy link
Collaborator

srlantz commented Oct 9, 2020

It would be desirable to add a third item to the proposed patch, namely (3) in Makefile.config, change -qopemp to -qopenmp-simd, and -fopenmp to -fopenmp-simd. As mentioned in PR #279, this is sufficient to remove the dependency on the OpenMP runtime, which is a small problem for CMS.

@slava77
Copy link
Collaborator

slava77 commented Oct 15, 2020

So, #280 is supposed to resolve the issue from the mkFit side.
It is available for outside use as https://github.com/trackreco/mkFit/releases/tag/V2.0.1-1%2Bpr280

The integration for CMSSW is in cms-sw/cmsdist#6312
I suppose, once that PR is merged, this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants