Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow-2.3.1-fosscuda-2020a-Python-3.8.2.eb fails to build (apparent compilation error) #12330

Closed
patoddh opened this issue Mar 3, 2021 · 7 comments
Milestone

Comments

@patoddh
Copy link

patoddh commented Mar 3, 2021

Cannot build TensorFlow-2.3.1-fosscuda-2020a-Python-3.8.2. Below is the error as far as I can tell. I am attaching the gzipped log file. This is on CentOS 8.3.

ERROR: /grid/it/data/elzar/easybuild/build/TensorFlow/2.3.1/fosscuda-2020a-Python-3.8.2/TensorFlow/tensorflow-2.3.1/tensorflow/python/BUILD:501:11: C++ compilation of rule '//tensorflow/python:bfloat16_lib' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
--
tensorflow/python/lib/core/bfloat16.cc: In function bool tensorflow::{anonymous}::Initialize():
tensorflow/python/lib/core/bfloat16.cc:664:36: error: no match for call to (tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [6], <unresolved overloaded function type>, const std::array<int, 3>&)

easybuild-TensorFlow-2.3.1-20210302.194515.yQwRu.log.gz

@patoddh
Copy link
Author

patoddh commented Mar 31, 2021

Has anyone been able the check/reproduce this? Would like to know if it is the easyconfig or my installation.

@Micket
Copy link
Contributor

Micket commented Mar 31, 2021

I did not come across this problem building on centos7.

I looked in your build logs for any low hanging fruit, like it picking up something from /usr/ when it shouldn't. But I can't see anything hinting to that.

The actual error you are getting here is a straight up compilation problem; GCC isn't figuring out the function prototype for CompareUFunc<Bfloat16EqFunctor> here
https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/python/lib/core/bfloat16.cc#L663
and can't match it to PyUFuncGenericFunction here:
https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/python/lib/core/bfloat16.cc#L637

Quite bizarre. tensorflow/tensorflow#41061 seems to indicate a problem with a to new numpy (no idea how the hell that can affect this code though.. bizarre).

@patoddh
Copy link
Author

patoddh commented Apr 1, 2021

That helps, since I did have a numpy version newer than the one in the SciPy-bundle dependency. Why did I have that? Because I cannot build SciPy-bundle due to the issue in #11093

So I need to install the packages in SciPy-bundle manually with pip, and comment out the SciPy-bundle dependency in easyconfigs.

I downgraded the numpy to 1.18.3 as per the SciPy-bundle easyconfig and it looks like I am by the above error. But now I hit a new one:

[work]$ grep -E '(Error:|ERROR:)'  /tmp/eb-6fqst6fd/easybuild-TensorFlow-2.3.1-20210401.103555.TYgoS.log
ERROR: /dev/shm/TensorFlow/2.3.1/fosscuda-2020a-Python-3.8.2/TensorFlow/tensorflow-2.3.1/tensorflow/python/keras/api/BUILD:137:19: Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v2 failed (Exit 1): bash failed: error executing command 
ImportError: /dev/shm/TensorFlow/2.3.1/fosscuda-2020a-Python-3.8.2/tmp4n931q2q-bazel-tf/368ff9a2b4071831dc310eeff4866434/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v2.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so: undefined symbol: _ZTVN6icu_669ErrorCodeE
ImportError: Traceback (most recent call last):
ImportError: /dev/shm/TensorFlow/2.3.1/fosscuda-2020a-Python-3.8.2/tmp4n931q2q-bazel-tf/368ff9a2b4071831dc310eeff4866434/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v2.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so: undefined symbol: _ZTVN6icu_669ErrorCodeE
ERROR: /dev/shm/TensorFlow/2.3.1/fosscuda-2020a-Python-3.8.2/TensorFlow/tensorflow-2.3.1/tensorflow/python/tools/BUILD:226:10 Executing genrule //tensorflow:tf_python_api_gen_v2 failed (Exit 1): bash failed: error executing command 
[work]$

Build log attached below...

easybuild-TensorFlow-2.3.1-20210401.103555.TYgoS.log.gz

@boegel boegel added this to the 4.x milestone Apr 2, 2021
@boegel
Copy link
Member

boegel commented Apr 2, 2021

@Flamefire Any ideas about this undefined symbol: _ZTVN6icu_669ErrorCodeE issue?

@branfosj
Copy link
Member

branfosj commented Apr 2, 2021

_ZTVN6icu_669ErrorCodeE

I've seen this. It is the lib64 / lib directory issue. Rebuild ICU (ICU-66.1-GCCcore-9.3.0.eb) with the fix from easybuilders/easybuild-framework#3401 (or go into the ICU 66.1 directory and create the symlink). There is more discussion about this in #11546

@Flamefire
Copy link
Contributor

Numpy changed its API somewhere for the 1.19 version and yes the ICU error shows that it is not picking up the correct ICU -> Rebuild it and it should work.

@patoddh
Copy link
Author

patoddh commented Apr 6, 2021

This worked, thanks very much for the tip.

@boegel boegel closed this as completed Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants