Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

oneDNN 2 missing headerfiles #19690

Closed
leezu opened this issue Dec 17, 2020 · 18 comments · Fixed by #19706
Closed

oneDNN 2 missing headerfiles #19690

leezu opened this issue Dec 17, 2020 · 18 comments · Fixed by #19706
Labels

Comments

@leezu
Copy link
Contributor

leezu commented Dec 17, 2020

#19667 breaks Horovod (cf horovod/horovod#2530) as some header files are missing in the pip wheel:/usr/local/lib/python3.6/dist-packages/mxnet/include/mkldnn/oneapi/dnnl/dnnl.hpp:23:10: fatal error: oneapi/dnnl/dnnl_config.h: No such file or directory

cc @bartekkuncer

@leezu
Copy link
Contributor Author

leezu commented Dec 17, 2020

@sxjscience
Copy link
Member

I've met this error too. @bartekkuncer Is it possible to add the header to pip wheel? Also ping @szha

@sxjscience
Copy link
Member

In terms of the wheel, I think the last wheel that works is 20201214.

@bartekkuncer
Copy link
Contributor

@leezu I believe that changing L149 to: shutil.copytree(os.path.join(CURRENT_DIR, 'mxnet-build/3rdparty/mkldnn/include/oneapi/dnnl'), fixes the issue but as I am not familiar with horovod I do not know how to check if the fix works. Can you provide me with a way to reproduce the issue?

@sxjscience
Copy link
Member

@bartekkuncer For me, I met this error when trying to install horovod (you may change the cuda version):

python3 -m pip install -U --pre "mxnet-cu102==2.0.0b20201217" -f https://dist.mxnet.io/python
HOROVOD_GPU_OPERATIONS=NCCL HOROVOD_WITHOUT_GLOO=1 HOROVOD_WITH_MPI=1 HOROVOD_WITH_MXNET=1 HOROVOD_WITHOUT_PYTORCH=1 HOROVOD_WITHOUT_TENSORFLOW=1 python3 -m pip install --no-cache-dir horovod

@szha
Copy link
Member

szha commented Dec 18, 2020

@bartekkuncer I recommend the following for verification:

  • Download a wheel with the correct headers (e.g. the 20201214 one that @sxjscience mentioend) and one without (the ones after 1214)
  • Unzip them and examine the header content and see where the headers should be.
  • Correct the toolts/pip/setup.py and use the wheel build scripts to build correct wheel.
  • Verify if the headers are in the right location in the resulting wheel.

@leezu
Copy link
Contributor Author

leezu commented Dec 18, 2020

Thanks @bartekkuncer! I opened #19694 as weekend has started in your timezone

@leezu
Copy link
Contributor Author

leezu commented Dec 21, 2020

Horovod now fails with

�[0m�[91m    In file included from /tmp/pip-req-build-bhade3mm/horovod/mxnet/mpi_ops.h:24:0,
                     from /tmp/pip-req-build-bhade3mm/horovod/mxnet/mpi_ops.cc:21:
    /usr/local/lib/python3.6/dist-packages/mxnet/include/mxnet/ndarray.h:41:10: fatal error: mkldnn.hpp: No such file or directory
�[0m�[91m     #include <mkldnn.hpp>
�[0m�[91m              ^~~~~~~~~~~~
�[0m�[91m    compilation terminated.
�[0m�[91m    horovod/mxnet/CMakeFiles/mxnet.dir/build.make:758: recipe for target 'horovod/mxnet/CMakeFiles/mxnet.dir/mpi_ops.cc.o' failed

@bartekkuncer
Copy link
Contributor

Horovod now fails with

�[0m�[91m    In file included from /tmp/pip-req-build-bhade3mm/horovod/mxnet/mpi_ops.h:24:0,
                     from /tmp/pip-req-build-bhade3mm/horovod/mxnet/mpi_ops.cc:21:
    /usr/local/lib/python3.6/dist-packages/mxnet/include/mxnet/ndarray.h:41:10: fatal error: mkldnn.hpp: No such file or directory
�[0m�[91m     #include <mkldnn.hpp>
�[0m�[91m              ^~~~~~~~~~~~
�[0m�[91m    compilation terminated.
�[0m�[91m    horovod/mxnet/CMakeFiles/mxnet.dir/build.make:758: recipe for target 'horovod/mxnet/CMakeFiles/mxnet.dir/mpi_ops.cc.o' failed

Yes, I saw that, working on the fix.

@leezu
Copy link
Contributor Author

leezu commented Dec 21, 2020

Thanks @bartekkuncer!

@bartekkuncer
Copy link
Contributor

@leezu I think this PR #19706 should fix the problem.

bartekkuncer added a commit to bartekkuncer/incubator-mxnet that referenced this issue Dec 22, 2020
bartekkuncer added a commit to bartekkuncer/incubator-mxnet that referenced this issue Dec 22, 2020
bartekkuncer added a commit to bartekkuncer/incubator-mxnet that referenced this issue Dec 22, 2020
bartekkuncer added a commit to bartekkuncer/incubator-mxnet that referenced this issue Dec 23, 2020
@leezu
Copy link
Contributor Author

leezu commented Dec 30, 2020

@bartekkuncer looks like the dnnl_config.h is included in the wrong directory. It should be in oneapi/dnnl/dnnl_config.h at least horovod build fails with usr/local/lib/python3.6/dist-packages/mxnet/include/mkldnn/oneapi/dnnl/dnnl.hpp:23:10: fatal error: oneapi/dnnl/dnnl_config.h: No such file or directory #include "oneapi/dnnl/dnnl_config.h")

@leezu
Copy link
Contributor Author

leezu commented Dec 30, 2020

The files included in are

mxnet/include/mkldnn
mxnet/include/mkldnn/mkldnn_version.h
mxnet/include/mkldnn/dnnl_debug.h
mxnet/include/mkldnn/mkldnn_debug.h
mxnet/include/mkldnn/dnnl_ocl.h
mxnet/include/mkldnn/dnnl_sycl.h
mxnet/include/mkldnn/dnnl_ocl.hpp
mxnet/include/mkldnn/mkldnn_types.h
mxnet/include/mkldnn/dnnl_version.h
mxnet/include/mkldnn/oneapi
mxnet/include/mkldnn/oneapi/dnnl
mxnet/include/mkldnn/oneapi/dnnl/dnnl_debug.h
mxnet/include/mkldnn/oneapi/dnnl/dnnl_ocl.h
mxnet/include/mkldnn/oneapi/dnnl/dnnl_sycl.h
mxnet/include/mkldnn/oneapi/dnnl/dnnl_ocl.hpp
mxnet/include/mkldnn/oneapi/dnnl/dnnl_types.h
mxnet/include/mkldnn/oneapi/dnnl/dnnl.hpp
mxnet/include/mkldnn/oneapi/dnnl/dnnl_sycl_types.h
mxnet/include/mkldnn/oneapi/dnnl/dnnl_threadpool_iface.hpp
mxnet/include/mkldnn/oneapi/dnnl/dnnl_threadpool.hpp
mxnet/include/mkldnn/oneapi/dnnl/dnnl_sycl.hpp
mxnet/include/mkldnn/oneapi/dnnl/dnnl_threadpool.h
mxnet/include/mkldnn/oneapi/dnnl/dnnl.h
mxnet/include/mkldnn/mkldnn_config.h
mxnet/include/mkldnn/dnnl_types.h
mxnet/include/mkldnn/dnnl.hpp
mxnet/include/mkldnn/dnnl_config.h
mxnet/include/mkldnn/mkldnn.hpp
mxnet/include/mkldnn/mkldnn_dnnl_mangling.h
mxnet/include/mkldnn/dnnl_sycl_types.h
mxnet/include/mkldnn/dnnl_threadpool_iface.hpp
mxnet/include/mkldnn/dnnl_threadpool.hpp
mxnet/include/mkldnn/dnnl_sycl.hpp
mxnet/include/mkldnn/dnnl_threadpool.h
mxnet/include/mkldnn/mkldnn.h
mxnet/include/mkldnn/dnnl.h

I think we may need to update https://github.com/apache/incubator-mxnet/blob/3c5beb3596b6bc01f77bc7ddd14ed90221c31950/cd/mxnet_lib/static/Jenkins_pipeline.groovy#L36 to ensure that the config files are stashed correctly on the CD

@szha
Copy link
Member

szha commented Dec 31, 2020

we might also consider making it robust in setup.py by asserting the existence of these header files instead of only include when available.

@bartekkuncer
Copy link
Contributor

bartekkuncer commented Jan 4, 2021

The files included in are

mxnet/include/mkldnn
mxnet/include/mkldnn/mkldnn_version.h
mxnet/include/mkldnn/dnnl_debug.h
mxnet/include/mkldnn/mkldnn_debug.h
mxnet/include/mkldnn/dnnl_ocl.h
mxnet/include/mkldnn/dnnl_sycl.h
mxnet/include/mkldnn/dnnl_ocl.hpp
mxnet/include/mkldnn/mkldnn_types.h
mxnet/include/mkldnn/dnnl_version.h
mxnet/include/mkldnn/oneapi
mxnet/include/mkldnn/oneapi/dnnl
mxnet/include/mkldnn/oneapi/dnnl/dnnl_debug.h
mxnet/include/mkldnn/oneapi/dnnl/dnnl_ocl.h
mxnet/include/mkldnn/oneapi/dnnl/dnnl_sycl.h
mxnet/include/mkldnn/oneapi/dnnl/dnnl_ocl.hpp
mxnet/include/mkldnn/oneapi/dnnl/dnnl_types.h
mxnet/include/mkldnn/oneapi/dnnl/dnnl.hpp
mxnet/include/mkldnn/oneapi/dnnl/dnnl_sycl_types.h
mxnet/include/mkldnn/oneapi/dnnl/dnnl_threadpool_iface.hpp
mxnet/include/mkldnn/oneapi/dnnl/dnnl_threadpool.hpp
mxnet/include/mkldnn/oneapi/dnnl/dnnl_sycl.hpp
mxnet/include/mkldnn/oneapi/dnnl/dnnl_threadpool.h
mxnet/include/mkldnn/oneapi/dnnl/dnnl.h
mxnet/include/mkldnn/mkldnn_config.h
mxnet/include/mkldnn/dnnl_types.h
mxnet/include/mkldnn/dnnl.hpp
mxnet/include/mkldnn/dnnl_config.h
mxnet/include/mkldnn/mkldnn.hpp
mxnet/include/mkldnn/mkldnn_dnnl_mangling.h
mxnet/include/mkldnn/dnnl_sycl_types.h
mxnet/include/mkldnn/dnnl_threadpool_iface.hpp
mxnet/include/mkldnn/dnnl_threadpool.hpp
mxnet/include/mkldnn/dnnl_sycl.hpp
mxnet/include/mkldnn/dnnl_threadpool.h
mxnet/include/mkldnn/mkldnn.h
mxnet/include/mkldnn/dnnl.h

I think we may need to update

https://github.com/apache/incubator-mxnet/blob/3c5beb3596b6bc01f77bc7ddd14ed90221c31950/cd/mxnet_lib/static/Jenkins_pipeline.groovy#L36

to ensure that the config files are stashed correctly on the CD

Thanks @leezu . I changed it in CI but must have overlooked it in CD.

@leezu
Copy link
Contributor Author

leezu commented Jan 4, 2021

It looks like there are still more issues with the CD. Horovod still fails with

�[0m�[91m    /usr/local/lib/python3.6/dist-packages/mxnet/include/mkldnn/oneapi/dnnl/dnnl.hpp:23:10: fatal error: oneapi/dnnl/dnnl_config.h: No such file or directory
�[0m�[91m     #include "oneapi/dnnl/dnnl_config.h"
�[0m�[91m              ^~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[91m    compilation terminated.
�[0m�[91m    horovod/mxnet/CMakeFiles/mxnet.dir/build.make:758: recipe for target 'horovod/mxnet/CMakeFiles/mxnet.dir/mpi_ops.cc.o' failed

@bartekkuncer
Copy link
Contributor

It looks like there are still more issues with the CD. Horovod still fails with

�[0m�[91m    /usr/local/lib/python3.6/dist-packages/mxnet/include/mkldnn/oneapi/dnnl/dnnl.hpp:23:10: fatal error: oneapi/dnnl/dnnl_config.h: No such file or directory
�[0m�[91m     #include "oneapi/dnnl/dnnl_config.h"
�[0m�[91m              ^~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[91m    compilation terminated.
�[0m�[91m    horovod/mxnet/CMakeFiles/mxnet.dir/build.make:758: recipe for target 'horovod/mxnet/CMakeFiles/mxnet.dir/mpi_ops.cc.o' failed

Yes, I saw that. #19726 should fix the issue.

@leezu
Copy link
Contributor Author

leezu commented Jan 6, 2021

Thank you @bartekkuncer!

@leezu leezu closed this as completed Jan 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants