Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splinecage example crashes on OS X #273

Closed
rainman110 opened this issue May 24, 2016 · 13 comments
Closed

Splinecage example crashes on OS X #273

rainman110 opened this issue May 24, 2016 · 13 comments

Comments

@rainman110
Copy link
Collaborator

I know, @jf--- already brought this up, but I can't find the issue anymore.

The example crashes with an illegal instruction.

Steps to reproduce:

  • Download pythonocc via conda
  • run core_geometry_splinecage.py from the examples directory

I analyzed it problem and have three observations:

  • It's an OCE specific problem.
  • The problems occur only in Release mode, not in Debug mode.
  • I narrowed it down to the file src/AdvApp2Var/AdvApp2Var_ApproxF2var.cxx (in particular the function mma2fnc_). Compiling this file only with optimization level -O1 "fixes" the problem - i.e. it does not crash then.

I still need to understand, why the code crashes. The code looks really compilicated and might be automatically translated from fortran. Maybe the compiler (clang) gets a hickup if we let him optimize too much.

@jf--- In what machine did you develop this example code? Was is also on a Mac but it didn't crash on your machine? If yes, the problem might also depend again the compiler version.

@tpaviot
Copy link
Owner

tpaviot commented May 24, 2016

@rainman110 FYI it also crashes on my old MacbookPro 2009 Snow Leopard. I'll try to compile with -O1 optimization flag to check your tweak is ok. I'm not sure it's a compiler issue, since clang/linux passes. I had a look at the AdvApp2Var_ApproxF2var.cxx, I confirm it's a translation from a Fortran file, it's almost impossible to debug.

@tpaviot
Copy link
Owner

tpaviot commented May 24, 2016

@rainman110 The only call to mma2fnc_ is from the AdvApp2Var_Iso.cxx file. Something interesting just before this function is called is this comment:

   // GCC 3.0 would not accept this line without the void
    // pointer cast.  Perhaps the real problem is a definition
    // somewhere that has a void * in it.
    AdvApp2Var_ApproxF2var::mma2fnc_(&NDIMEN,
                     &NBSESP,
                     &NDIMSE,

@jf---
Copy link
Contributor

jf--- commented May 24, 2016

Wow, so OpenCASCADE exists since Matra datavision was working on a new version of Euclid ( ( late 70/80-ies ) written in Fortran ), called Euclid Quantum, which we now know as OCC.

Possible this is a remnant of the ol' Euclid code base?
Interesting find...

@rainman110 , more practically, yes this does smell of a side-effect of a clang optimization. I usually am up to date with XCode, hence up to date with Clang ( not sure if this is an "advantage" with OCC )

I dont think this section of OCC ( however relevant... ) sees a lot of updates. Its quite possible that the OCCT team has overseen this issue while updating the codebase to work with recent clang compilers?

@rainman110
Copy link
Collaborator Author

@jf--- Maybe the OCCT team has overseen this issue. The splinecage example is too large to pinpoint the problem. It would be cool, if we could create a minimal C++ example and test it against both OCE (0.16) and the latest OCCT. The file AdvApp2Var_Iso.cxx has not be changed much in in official OCCT git, so I think the problem could occur also in OCCT (except they tweaked the compiler settings to fix it).

@rainman110
Copy link
Collaborator Author

rainman110 commented May 24, 2016

By the way, this looks a lot like our problem:
http://tracker.dev.opencascade.org/view.php?id=26778

To be more correct, this IS our problem. It seems, that Roman Lygin found a workaround by applying

 __attribute__ ((noinline)) 

on the static functions of this file.

To sum this bug up:

  • It started with clang 3.4
  • Clang inserts the invalid instruction ud2, when it gets confused
  • The problem still exists in OCCT, but can not be reproduced by the OCCT guys due to lack of Macs
  • alternatively use -O1 as a workaround

@jf---
Copy link
Contributor

jf--- commented May 24, 2016

Wow, actual compiler bugs are pretty rare, interesting to see that...
@rainman110 , that's great find in the OCCT tracker...
Istvan Csanady also informs us that This crash is gone with the latest release of Xcode.

What's the best way to resolve things?

  • inform that the code is being compiled by a buggy CLang compiler
  • apply Roman's fix ( there might be more instances where this is required? )

@rainman110
Copy link
Collaborator Author

@jf--- I could test it with the newest xcode. Still I think this should be fixed inside the code - i.e. use Romans fix, in case someone compiles it with an older xcode/clang.

Actually compiler bugs are not that rare. In particular my Fortran colleages have problems to find ANY fortran compiler that works correctly with the newer fortran (2003?) specification. They report compiler issues almost every week 😁

@rainman110
Copy link
Collaborator Author

@tpaviot As this issue is OCE specific, we should do something on the OCE side.

We could

@tpaviot
Copy link
Owner

tpaviot commented May 24, 2016

@rainman110 of course, please open an issue on the oce issue tracker

@rainman110
Copy link
Collaborator Author

I can confirm, that both workarounds seem to work:

  • Applying the fix of Roman Lygin works
  • Updating to the newest Xcode / Clang (Apple LLVM 7.3.0) also works

I'll produce new binaries with new new Xcode tool chain.

@jf---
Copy link
Contributor

jf--- commented May 29, 2016

@rainman110 happy to hear so, looking fwd & thanks in advance for the new build!

@tpaviot
Copy link
Owner

tpaviot commented May 30, 2016

@rainman110 I think we should include the Roman's fix into oce, in order to let oce compile/work on older xcode/clang versions. Could you create a PR on the oce repository ?

@rainman110
Copy link
Collaborator Author

I updated the oce conda package for OS X. The splinecage example now works on my machine.

I close this issue, since the example works and work has to be done only for OCE.

I'll make a PR for OCE. Until then, we'll leave the OCE related issue open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants