Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Intel icc/icpc via oneAPI #2573

Merged
merged 22 commits into from
Jan 15, 2021
Merged

CI: Intel icc/icpc via oneAPI #2573

merged 22 commits into from
Jan 15, 2021

Conversation

ax3l
Copy link
Collaborator

@ax3l ax3l commented Oct 10, 2020

Add testing for Intel icc/icpc via the oneAPI images.
Intel oneAPI is in a late beta stage, currently shipping oneAPI beta09 with ICC 20.2.

We are adding // [workaround(intel)] near the workarounds to it will be easy to search them out later, and disable them to see if they have been fixed by the Intel compiler team.

Here are the workarounds here:

  • py::args() -> py::args{}, in tests, regression in ICC 20+, workaround in tests
  • = default instead of {} doesn’t always work (was a recent modernization, so might have been a problem before too), workaround in tests

Suggested changelog entry:

* Support Intel OneAPI compiler (ICC 20.2) and add to CI.

@ax3l ax3l added enhancement ci related to the CI system labels Oct 10, 2020
@github-actions github-actions bot removed the ci related to the CI system label Oct 10, 2020
@ax3l ax3l force-pushed the topic-icc branch 15 times, most recently from c86a9ef to 35e91db Compare October 10, 2020 08:34
@ax3l ax3l force-pushed the topic-icc branch 2 times, most recently from 761c316 to 5fb34c7 Compare October 11, 2020 07:19
@henryiii
Copy link
Collaborator

Have you tried running the same docker commands locally?

@ax3l
Copy link
Collaborator Author

ax3l commented Oct 12, 2020

Yep, can reproduce on ubuntu:18.04 (ships Python 3.6.9).
Looks like every individual pytest segfaults.

When I am building in Debug mode, I get linker errors on the pybind11_tests.cpython-36m-x86_64-linux-gnu.so target with

CMakeFiles/pybind11_tests.dir/test_class.cpp.o:(.data.rel.ro.local+0x198): undefined reference to `test_submodule_class_(pybind11::module_&)::PublicistB::~PublicistB()'
CMakeFiles/pybind11_tests.dir/test_class.cpp.o:(.data.rel.ro.local+0x1a0): undefined reference to `test_submodule_class_(pybind11::module_&)::PublicistB::~PublicistB()'

Independent of that, when looking at the code, is it possible that ProtectedA lacks a virtual default constructor?

Removing the ProtectedB destructor compiles though ^^ (Adding one to ProtectedA gives the same linker issue.)

Backtrace of the segfault:

Program received signal SIGSEGV, Segmentation fault.
0x00007f0a48c2b42e in pybind11::arg::noconvert (this=0x7f0a493740a0 <__$Ucd8>, flag=true) at /pybind11/include/pybind11/cast.h:1857
1857	    arg &noconvert(bool flag = true) { flag_noconvert = flag; return *this; }
(gdb) bt
#0  0x00007f0a48c2b42e in pybind11::arg::noconvert (this=0x7f0a493740a0 <__$Ucd8>, flag=true) at /pybind11/include/pybind11/cast.h:1857
#1  0x00007f0a48c2164b in test_submodule_builtin_casters (m=...) at /pybind11/tests/test_builtin_casters.cpp:145
#2  0x00007f0a48b953c1 in test_initializer::test_initializer(char const*, void (*)(pybind11::module_&))::{lambda(pybind11::module_&)#1}::operator()(pybind11::module_&) const (this=0x1a82570, parent=...)
    at /pybind11/tests/pybind11_tests.cpp:41
#3  0x00007f0a48bb2615 in std::_Function_handler<void (pybind11::module_&), test_initializer::test_initializer(char const*, void (*)(pybind11::module_&))::{lambda(pybind11::module_&)#1}>::_M_invoke(std::_Any_data const&, pybind11::module_&) (__functor=..., __args=...) at /usr/include/c++/7.5.0/bits/std_function.h:316
#4  0x00007f0a48baff4f in std::function<void (pybind11::module_&)>::operator()(pybind11::module_&) const (this=0x1a82570, __args=...) at /usr/include/c++/7.5.0/bits/std_function.h:706
#5  0x00007f0a48b98779 in _INTERNALdb8651f0::pybind11_init_pybind11_tests (m=...) at /pybind11/tests/pybind11_tests.cpp:90
#6  0x00007f0a48b98045 in PyInit_pybind11_tests () at /pybind11/tests/pybind11_tests.cpp:65
#7  0x00000000005fb2bf in _PyImport_LoadDynamicModuleWithSpec ()
#8  0x00000000005fb53d in ?? ()
#9  0x00000000005671ce in PyCFunction_Call ()
#10 0x0000000000511341 in _PyEval_EvalFrameDefault ()
#...
#150 0x00000000006390af in Py_Main ()
#151 0x00000000004b0dc0 in main ()

@henryiii
Copy link
Collaborator

henryiii commented Oct 12, 2020

Storing a recipe for docker for my use:

apt-get update && apt install -y wget build-essential pkg-config cmake ca-certificates gnupg git && wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2023.PUB && apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2023.PUB && echo "deb https://apt.repos.intel.com/oneapi all main" | tee /etc/apt/sources.list.d/oneAPI.list && apt-get update && apt-get install -y intel-oneapi-dpcpp-cpp-compiler-pro cmake python3-dev python3-numpy python3-pytest python3-pip
source /opt/intel/oneapi/setvars.sh
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade pytest cmake

/usr/local/bin/cmake -S . -B build -DPYBIND11_WERROR=ON -DDOWNLOAD_CATCH=ON -DDOWNLOAD_EIGEN=ON -DCMAKE_CXX_COMPILER=$(which icpc) -DPYTHON_EXECUTABLE=$(which python3)
/usr/local/bin/cmake --build build -j 8

@henryiii
Copy link
Collaborator

henryiii commented Oct 12, 2020

I am getting a few remarks:

remark #11074: Inlining inhibited by limit max-size
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo
...
remark #11074: Inlining inhibited by limit max-size
remark #11074: Inlining inhibited by limit max-total-size
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo

And then I do get the segfault. (this is testing on the fix/intel branch, and using Ubuntu 18 docker)

In fact, this segfaults:

python3 -c "import pybind11_tests"

But this does not:

python3 -c "import pybind11_cross_module_tests"
python3 -c "import cross_module_gil_utils"

@wjakob
Copy link
Member

wjakob commented Oct 12, 2020

Not having a virtual destructor in ProtectedA is fine: there is no polymorphism, only 1 POD member. TBH I wouldn't be surprised if this is just a miscompilation based on prior experience with ICPC..

@ax3l : When you make the change to avoid linker errors, is the behavior identical in Debug/Release mode?

@ax3l
Copy link
Collaborator Author

ax3l commented Oct 13, 2020

Interestingly, I don't get any linker errors in default builds (MinSizeRel?), but the same segfault.

Inlining limits: yes, I saw them as well... We can increase the limit and check if that mitigates it, although that would be a curious reason.

Maybe the problem is stemming from IPO?

Update: patched out some IPO flags - but still got the inlining/IPO remarks (but saw no IPO flags on the CLI):

diff --git a/tools/pybind11Common.cmake b/tools/pybind11Common.cmake
index 8ee22de..fb33bb5 100644
--- a/tools/pybind11Common.cmake
+++ b/tools/pybind11Common.cmake
@@ -315,8 +315,8 @@ function(_pybind11_generate_lto target prefer_thin_lto)
     endif()
   elseif(CMAKE_CXX_COMPILER_ID MATCHES "Intel")
     # Intel equivalent to LTO is called IPO
-    _pybind11_return_if_cxx_and_linker_flags_work(HAS_INTEL_IPO "-ipo" "-ipo"
-                                                  PYBIND11_LTO_CXX_FLAGS PYBIND11_LTO_LINKER_FLAGS)
+    #_pybind11_return_if_cxx_and_linker_flags_work(HAS_INTEL_IPO "-ipo" "-ipo"
+    #                                              PYBIND11_LTO_CXX_FLAGS PYBIND11_LTO_LINKER_FLAGS)
   elseif(MSVC)
     # cmake only interprets libraries as linker flags when they start with a - (otherwise it
     # converts /LTCG to \LTCG as if it was a Windows path).  Luckily MSVC supports passing flags

Same segfault persists. (Update2: changing visibility flags also did not change the segfault.)

@tobiasleibner
Copy link
Contributor

tobiasleibner commented Nov 23, 2020

This was mentioned in #2679, so I had a look at the segfault. Apparently, it is a specific problem with the noconvert and none constructs. Calling

sm.def("accept_double_noconvert",
            [](py::array_t<double, 0>) {},
            py::arg("a").noconvert());

segfaults. When replacing this by

auto py_arg = py::arg("a");
sm.def("accept_double_noconvert",
       [](py::array_t<double, 0>) {},
       py_arg.noconvert());

the segfault is gone. Same for py::arg().none(). After commenting out all occurrences of py::arg().none() and py::arg().noconvert() in the tests,

python3 -c "import pybind11_tests"

does not segfault anymore and only gives the output

/opt/intel/oneapi/intelpython/latest/lib/python3.7/importlib/_bootstrap.py:219: FutureWarning: pybind11-bound class 'pybind11_tests.factory_constructors.TestFactory3' is using an old-style placement-new '__init__' which has been deprecated. See the upgrade guide in pybind11's docs. This message is only visible when compiled in debug mode.
  return f(*args, **kwds)
/opt/intel/oneapi/intelpython/latest/lib/python3.7/importlib/_bootstrap.py:219: FutureWarning: pybind11-bound class 'pybind11_tests.factory_constructors.NoisyAlloc' is using an old-style placement-new '__init__' which has been deprecated. See the upgrade guide in pybind11's docs. This message is only visible when compiled in debug mode.
  return f(*args, **kwds)
/opt/intel/oneapi/intelpython/latest/lib/python3.7/importlib/_bootstrap.py:219: FutureWarning: pybind11-bound class 'pybind11_tests.pickling.Pickleable' is using an old-style placement-new '__setstate__' which has been deprecated. See the upgrade guide in pybind11's docs. This message is only visible when compiled in debug mode.
  return f(*args, **kwds)
/opt/intel/oneapi/intelpython/latest/lib/python3.7/importlib/_bootstrap.py:219: FutureWarning: pybind11-bound class 'pybind11_tests.pickling.PickleableWithDict' is using an old-style placement-new '__setstate__' which has been deprecated. See the upgrade guide in pybind11's docs. This message is only visible when compiled in debug mode.
  return f(*args, **kwds)

The py::arg().noconvert() construct seems dangerous, py::arg() is creating a temporary object, and the noconvert() call returns a mutable reference to this temporary. I don't know what the def function does with that reference, but I guess the Intel compiler discards the temporary before it is accessed.
Edit: As far as I understand the lifetime of temporaries, the py::arg().noconvert() construct should be fine though as the temporary will not be destroyed before the end of the full expression, i.e., the temporary should live long enough. So maybe this is an Intel compiler bug?
Edit2: Using py::arg{}.noconvert() instead of py::arg().noconvert() also fixes the segfault.

ax3l added 3 commits January 3, 2021 20:42
Changed upstream with the last oneAPI release.
pytest 6 does not capture the `discard_as_unraisable` stderr and
just writes a warning with its content instead.
@henryiii henryiii added the compiler: intel Related to the Intel compilers label Jan 7, 2021
@henryiii henryiii force-pushed the topic-icc branch 6 times, most recently from 3ff528c to b753608 Compare January 14, 2021 16:38
@henryiii
Copy link
Collaborator

I'm not totally sure that 20e467d is better; hoping it will get cached, and that might save some setup time. I'd probably recommend saving the commit and reverting it if it's not better (unless we like it better?).

@YannickJadoul
Copy link
Collaborator

YannickJadoul commented Jan 14, 2021

I'm not totally sure that 20e467d is better; hoping it will get cached, and that might save some setup time. I'd probably recommend saving the commit and reverting it if it's not better (unless we like it better?).

It took 4 minutes now, the "Building docker image". How can we check if it gets cached?

I don't think I like it better, though. Couldn't we do something like https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-syntax-for-github-actions#jobsjob_idcontainer, to run this job inside the container with ICC set up, but still have all the different steps individually (similar to the other jobs) ?

@henryiii
Copy link
Collaborator

Sure, but we'd need a container, and I don't want to set something up just for this. If there's a way to add a job that builds a container and uploads it to the GH registry if it doesn't exist or needs updating, then we pull from the local registry, that could work, but it would be harder to setup. I think the local action gets cached automatically if unchanged.

I'm also thinking about how we expand this to C++17, but we could just do that as more steps in one job.

It's not all that bad, so we can just go with what we had (though if this works, I'd want to do it for PGI too, and that might make it more reliable since pulling it fails once in a while).

I can pull this commit out and make a new PR with it, probably better.

@henryiii
Copy link
Collaborator

henryiii commented Jan 14, 2021

8 mins 27 seconds total, currently, for comparison, when not caching on the docker action method.

@henryiii
Copy link
Collaborator

6m 55s for the non-docker version, 2+ mins in setup. Might investigate later, but not critical (and the job is not as long as I thought it was).

@henryiii
Copy link
Collaborator

Got an okay from @wjakob, so I'm going to merge this, then work on #2729 and merge that; it will include something we can turn back on this behavior for Intel to test (even if it's just instructions).

@henryiii henryiii merged commit 0b3df7f into pybind:master Jan 15, 2021
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Jan 15, 2021
@ax3l ax3l deleted the topic-icc branch January 16, 2021 00:26
@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler: intel Related to the Intel compilers enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants