Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Ninja with nvcc_wrapper re-compiles sources every time #321

Closed
bradking opened this issue Apr 28, 2016 · 29 comments
Closed

Using Ninja with nvcc_wrapper re-compiles sources every time #321

bradking opened this issue Apr 28, 2016 · 29 comments
Labels
CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. Framework tasks Framework tasks (used internally by Framework team) MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot.

Comments

@bradking
Copy link
Contributor

bradking commented Apr 28, 2016

Discussion in #261 found the problem can be reproduced with stock CMake and Ninja, so I'm starting a new issue for it since it is independent of the custom (Fortran-supporting) CMake and Ninja versions reported there. Here are steps to reproduce on Linux:

$ export PATH=/path/to/Trilinos/packages/kokkos/config:$PATH
$ export OMPI_CC=gcc-4.9
$ export OMPI_CXX=nvcc_wrapper
$ export NVCC_WRAPPER_DEFAULT_COMPILER=g++-4.9
$ export CC=mpicc CXX=mpicxx

$ mpicxx -showme:version
mpicxx: Open MPI 1.10.2 (Language: C++)

$ nvcc --version
...
Cuda compilation tools, release 7.0, V7.0.27

$ cmake --version
cmake version 3.5.2

$ (cd ../Trilinos; git rev-parse HEAD)
9289740d06a6b708e9fc75cad9e2b71615e6dd66

$ cmake ../Trilinos -GNinja -DTrilinos_ENABLE_Fortran=OFF -DTrilinos_ENABLE_Gtest=ON -DTPL_ENABLE_Matio=OFF

$ ninja commonTools/gtest/CMakeFiles/gtest.dir/gtest/gtest-all.cc.o
[1/1] Building CXX object commonTools/gtest/CMakeFiles/gtest.dir/gtest/gtest-all.cc.o

$ ninja -t deps commonTools/gtest/CMakeFiles/gtest.dir/gtest/gtest-all.cc.o
commonTools/gtest/CMakeFiles/gtest.dir/gtest/gtest-all.cc.o: #deps 1, deps mtime 1461877501 (VALID)
    /tmp/tmpxft_00003d40_00000000-14_gtest-all.ii

$ ninja -d explain commonTools/gtest/CMakeFiles/gtest.dir/gtest/gtest-all.cc.o
$ ninja -d explain commonTools/gtest/CMakeFiles/gtest.dir/gtest/gtest-all.cc.o
ninja explain: output /tmp/tmpxft_00003d40_00000000-14_gtest-all.ii of phony edge with no inputs doesn't exist
ninja explain: /tmp/tmpxft_00003d40_00000000-14_gtest-all.ii is dirty
[1/1] Building CXX object commonTools/gtest/CMakeFiles/gtest.dir/gtest/gtest-all.cc.o

The compiler wrappers somehow use a temporary already-preprocessed .ii file as input to the g++-4.9 call. This causes the -MD option to generate a depfile listing only the temporary file. Ninja records this dependency and then cannot find it on the next build.

Tasks:

  1. Diagnose the problem [Done]
  2. Postulate a solution (see) [Done]
  3. Modify nvcc_wrapper to better generate dependencies for Ninja (see) ...
  4. Verify working with Trilinos+Drekar build ...
@bathmatt
Copy link
Contributor

this is what I get. looks like the same as you do

dir-nb (gcc) 11 $ ninja commonTools/gtest/CMakeFiles/gtest.dir/gtest/gtest-all.cc.o
[1/1] Building CXX object commonTools/gtest/CMakeFiles/gtest.dir/gtest/gtest-all.cc.o
/home/mbetten/Trilinos/Trilinos/commonTools/gtest/gtest/gtest-all.cc(7921): warning: variable "testing::internal::kPathSeparatorString" was declared but never referenced

/home/mbetten/Trilinos/Trilinos/commonTools/gtest/gtest/gtest-all.cc(3756): warning: function "testing::::TestNameIs::operator()" was declared but never referenced

/home/mbetten/Trilinos/Trilinos/commonTools/gtest/gtest/gtest-all.cc(7921): warning: variable "testing::internal::kPathSeparatorString" was declared but never referenced

/home/mbetten/Trilinos/Trilinos/commonTools/gtest/gtest/gtest-all.cc(3752): warning: function "testing::::TestNameIs::TestNameIs(const char *)" was declared but never referenced

/home/mbetten/Trilinos/Trilinos/commonTools/gtest/gtest/gtest-all.cc(3756): warning: function "testing::::TestNameIs::operator()" was declared but never referenced

dir-nb (gcc) 12 $ ninja -t deps commonTools/gtest/CMakeFiles/gtest.dir/gtest/gtest-all.cc.o
commonTools/gtest/CMakeFiles/gtest.dir/gtest/gtest-all.cc.o: #deps 1, deps mtime 1461878312 (VALID)
/tmp/tmpxft_0000fb07_00000000-14_gtest-all.ii

dir-nb (gcc) 13 $ ninja -d explain commonTools/gtest/CMakeFiles/gtest.dir/gtest/gtest-all.cc.o
ninja explain: output /tmp/tmpxft_0000fb07_00000000-14_gtest-all.ii of phony edge with no inputs doesn't exist
ninja explain: /tmp/tmpxft_0000fb07_00000000-14_gtest-all.ii is dirty
[1/1] Building CXX object commonTools/gtest/CMakeFiles/gtest.dir/gtest/gtest-all.cc.o
/home/mbetten/Trilinos/Trilinos/commonTools/gtest/gtest/gtest-all.cc(7921): warning: variable "testing::internal::kPathSeparatorString" was declared but never referenced

/home/mbetten/Trilinos/Trilinos/commonTools/gtest/gtest/gtest-all.cc(3756): warning: function "testing::::TestNameIs::operator()" was declared but never referenced

/home/mbetten/Trilinos/Trilinos/commonTools/gtest/gtest/gtest-all.cc(7921): warning: variable "testing::internal::kPathSeparatorString" was declared but never referenced

/home/mbetten/Trilinos/Trilinos/commonTools/gtest/gtest/gtest-all.cc(3752): warning: function "testing::::TestNameIs::TestNameIs(const char *)" was declared but never referenced

/home/mbetten/Trilinos/Trilinos/commonTools/gtest/gtest/gtest-all.cc(3756): warning: function "testing::::TestNameIs::operator()" was declared but never referenced

dir-nb (gcc) 14 $

@bartlettroscoe
Copy link
Member

CC: @bartlettroscoe

@crtrott
Copy link
Member

crtrott commented Apr 28, 2016

Btw Bart: this is not the compiler wrapper, but rather the NVCC two pass process. NVCC first goes through the file and generates effectively two new source files: one where all Cuda device specific code is removed, which then gets passed along to the host compiler (by default g++). The other file has only the Cuda device code which then is compiled by the thing generating PTX code (actually its a bit more complicated than that, but this is the basic gist). This causes all kinds of weird temporary files to pop up, which all kinds of tools accustomed to the normal GCC tool-chain have no idea what to do with.

@bartlettroscoe
Copy link
Member

It is good to know this is due to nvcc and not specific to nvcc_wrapper. That means that putting in a fix for this into CMake should be worth the effort and will benefit projects other than Trilinos and CASL VERA, etc. This is definitely worth fixing.

@crtrott
Copy link
Member

crtrott commented Apr 28, 2016

Yes Ross this is true. Also note that other labs are going down the same route as Sandia, i.e. having Cuda code in their main cpp files so that they must compile everything with NVCC.
Also, nvcc_wrapper is now a standalone repository, since the same labs (e.g. LLNL) are interested in using it independendly of Kokkos. That is because nvcc_wrapper at least solves the majority of issues you have if you try to use NVCC directly as the main compiler (if you want to know what I am talking about try setting OMPI_CXX=nvcc and compile Trilinos).

@bartlettroscoe
Copy link
Member

Also, nvcc_wrapper is now a standalone repository

Is this the repo:

https://github.com/kokkos/nvcc_wrapper

?

Note that when you snapshot into Kokkos, then you should use the snapshot script to record the SHA1 and origin repo info so you can trace the versions between the different reps. See the process at:

http://trac.trilinos.org/wiki/TriBITSTrilinosDev#update_snapshot

(Just replace "TriBITS" with "nvcc_wrapper" and "Trilinos" with "kokkos".)

@crtrott
Copy link
Member

crtrott commented Apr 28, 2016

Jup that is it and will do.

@bradking
Copy link
Contributor Author

this is not the compiler wrapper, but rather the NVCC two pass process. NVCC first goes through the file and generates effectively two new source files

It is this pass that should be responsible for generating the depfile for ninja via -MD -MT $obj -MF $depfile. These flags should not be passed on to the underlying compiler invocations because they are not given user-facing files directly. Whatever pass in nvcc actually preprocesses the original translation unit should respond to the flags and generate the depfile for Ninja. Fixing this may involve working with NVIDIA to get such support added to nvcc.

@bartlettroscoe
Copy link
Member

Fixing this may involve working with NVIDIA to get such support added to nvcc.

It would be great if NVIDIA could fix this but we cannot wait for NVIDIA since we have to use existing installations of CUDA (and we have to fix this now).

These flags should not be passed on to the underlying compiler invocations because they are not given user-facing files directly.

Would it be possible to set up nvcc_wrapper so that when it sees -MD -MT $obj -MF $depfile then it just calls the host compiler and not nvcc at all? That should result in tracing the include files to build the dependencies. Would that not do the trick?

@bradking
Copy link
Contributor Author

Would it be possible to set up nvcc_wrapper so that when it sees -MD -MT $obj -MF $depfile then it just calls the host compiler and not nvcc at all?

Yes, this should be possible. It can invoke the compiler and ask it to produce the depfile but not actually compile anything. Then strip these flags and do not pass them to nvcc or the compiler it invokes. I'm not familiar enough with nvcc_wrapper to provide more details though.

@bartlettroscoe
Copy link
Member

Would it be possible to set up nvcc_wrapper so that when it sees -MD -MT $obj -MF $depfile then it just calls the host compiler and not nvcc at all?

Yes, this should be possible. It can invoke the compiler and ask it to produce the depfile but not actually compile anything. Then strip these flags and do not pass them to nvcc or the compiler it invokes. I'm not familiar enough with nvcc_wrapper to provide more details though.

@crtrott, as the maintainer of nvcc_wrapper, would this be possible? I guess we could mess around with it and try? It is just a bash script:

https://github.com/kokkos/nvcc_wrapper/blob/master/nvcc_wrapper

On that point, are their any automated unit tests for nvcc_wrapper? We could use dummy programs for the host compiler and nvcc to make sure that the argument parsing and filtering is done correctly. There is actually a simple Python script in TriBITS that could help with this:

https://github.com/TriBITSPub/TriBITS/blob/master/tribits/python_utils/mockprogram.py

Alternatively, I see that there are dry_run and host_only variables that could be used to construct grep-style unit tests without needing to use something more complex like mockprogram.py.

If there are no unit tests for nvcc_wrapper (which is over 200 lines long), then I could help you set up some of these automated tests (using the Python unittest module to avoid dependencies on anything else). It would make me nervous to have a script that complex and that important without any unit tests.

@crtrott
Copy link
Member

crtrott commented Apr 29, 2016

There are no unit-tests currently. Essentially we are doing integration testing since it is used nightly to build Trilinos and Kokkos standalone. I think Mike also set up a nightly Sierra build using nvcc_wrapper, to check whether NVCC can handle it.

With regards to just invoking the host compiler: that is not an option. The source files contain Cuda extensions (such as device host) which the host compiler can not understand.

But maybe the solution is as simple as adding "-MD" to shared args without an argument and "-MF" to shared args with an argument in the script. We recently added -MT.

@bartlettroscoe
Copy link
Member

There are no unit-tests currently. Essentially we are doing integration testing since it is used nightly to build Trilinos and Kokkos standalone. I think Mike also set up a nightly Sierra build using nvcc_wrapper, to check whether NVCC can handle it.

Those integration tests are important obviously but that does not support test-driven development very well. It should not be hard to set up some basic unit tests that pin down the most critical behavior (just a couple of hours of work or less should do it). I can start the testing infrastructure if you are supportive of this and will maintain this. Once this is set up, anyone can safely change nvcc_wrapper without fear of breaking existing important use cases, and get that feedback very fast (i.e. seconds instead of hours or days).

With regards to just invoking the host compiler: that is not an option. The source files contain Cuda extensions (such as device host) which the host compiler can not understand.

If just the preprocessor of the host compiler is run, would it get tripped up by CUDA extensions? Does the pre-processor even care about C++ syntax?

But maybe the solution is as simple as adding "-MD" to shared args without an argument and "-MF" to shared args with an argument in the script. We recently added -MT.

What specifically are you suggesting? Can you give a concrete use case for how nvcc_wrapper -MD -MT $obj -MF $depfile would be translated into command lines to the host compiler and nvcc?

@crtrott
Copy link
Member

crtrott commented Apr 29, 2016

Ross feel to set something up. My feeling was so far that it is sufficiently complicated to get anything approaching significant test coverage, and until recently nvcc_wrapper was maintained inside of Kokkos. So before something got changed it got tested against a Kokkos build.

With regards to the flags I just checked. NVCC knows about MT but not about MF or MD. That means we already do the right thing. I.e. nvcc_wrapper will pass MF and MD only to the host compiler, MT goes to both. The only part which might be off is the order of MD MT and MF, but MT will keep its argument next to it. So I currently don't have a good idea what to do. You can try and add an option to give the file just to the host compiler if all of those flags are present (assuming you only run the preprocessor). I am not 100% confident that will work as intended. In particular the preprocessor will not define the defines NVCC automatically adds. That means you will have a miss identification of the compiler inside of Kokkos, which will probably screw up the include files and might even produce a preprocessor error via #error, since the Trilinos configuration is saying that we indeed try to compile for CUDA.

If that is the case (which I highly suspect) you might be able to extract all the defines we use from NVCC, and add them manually through nvcc_wrapper for a preprocessor run. But that is starting to get really messy.

@bartlettroscoe
Copy link
Member

Ross feel to set something up. My feeling was so far that it is sufficiently complicated to get anything approaching significant test coverage,

Okay, I will block off some time in the next few days to fork the nvcc_wrapper git repo and add a Python-based unit test harness to start testing some obvious use cases. Once the basic testing utilities are written, it will be easy to add new test cases and getting pretty good coverage (100% line coverage) as long as nvcc_wrapper only parses the input command-line args and then constructs command-line args for the host and nvcc compiler (i.e. does not create or read files itself, which it does not appear to be doing currently). After that I will add a PR for this.

With regards to the flags I just checked. NVCC knows about MT but not about MF or MD. That means we already do the right thing. I.e. nvcc_wrapper will pass MF and MD only to the host compiler, MT goes to both. The only part which might be off is the order of MD MT and MF, but MT will keep its argument next to it. So I currently don't have a good idea what to do. You can try and add an option to give the file just to the host compiler if all of those flags are present (assuming you only run the preprocessor). I am not 100% confident that will work as intended. In particular the preprocessor will not define the defines NVCC automatically adds. That means you will have a miss identification of the compiler inside of Kokkos, which will probably screw up the include files and might even produce a preprocessor error via #error, since the Trilinos configuration is saying that we indeed try to compile for CUDA.

That is unfortunate.

If that is the case (which I highly suspect) you might be able to extract all the defines we use from NVCC, and add them manually through nvcc_wrapper for a preprocessor run. But that is starting to get really messy.

I agree that this is not a very nice game to play. But with some introspective configure-time tests, it should be possible to figure out what defines NVCC is using for a given platform. Then you could just pass those through the invocation of nvcc_wrapper automatically with TriBITS. Does that make sense? Anyway, I will need to get set up to invoke nvcc_wrapper on my own so that I can see these issues myself. I will need to do this anyway as part of #172 (and take this burden off of @bathmatt).

@bradking, given all of this, would it be possible to make the CMake parsing of the output from nvcc more robust without having to do lots of work with nvcc_wrapper? If that is possible, then that is likely the best short-term solution given there will be other projects that will want to use CMake+Ninja+CUDA/nvcc without the help of nvcc_wrapper.

@bradking
Copy link
Contributor Author

bradking, given all of this, would it be possible to make the CMake parsing of the output from nvcc more robust without having to do lots of work with nvcc_wrapper?

CMake is not parsing anything in this case. The generated depfile is loaded directly by Ninja. Also, by the time nvcc returns the temporary .ii file is gone so there is no way to read it to extract any information about the original sources. This really needs to be solved on the nvcc/nvcc_wrapper side. There is nothing CMake or Ninja can do about it because the information they need is not available.

I was able to get nvcc to compile the source and produce a depfile by invoking it twice:

$ >test.cc
$ nvcc -MT test.cc.o -o /dev/null -Xcompiler -MD,-MF,test.cc.o.d -x cu -E test.cc
$ nvcc -o test.cc.o -x cu -c test.cc
$ head -3 test.cc.o.d
test.o: test.cc /usr/include/stdc-predef.h /usr/include/cuda_runtime.h \
 /usr/include/host_config.h /usr/include/features.h \
 /usr/include/x86_64-linux-gnu/sys/cdefs.h \

One could teach nvcc_wrapper to do this when it sees -MD (or -MMD) and then work with NVIDIA to fix future nvcc versions to support the flag properly.

@crtrott
Copy link
Member

crtrott commented Apr 29, 2016

Yeah that might be worth a try.

@bradking
Copy link
Contributor Author

Actually in my previous example we should not be giving -MT to nvcc either because it is the underlying compiler that is generating the depfile. In order to get the right value on the left hand side of the : in the depfile we need to teach nvcc_wrapper to do this instead:

$ >test.cc
$ nvcc -o /dev/null -x cu -E test.cc -Xcompiler -MD,-MT,test.cc.o,-MF,test.cc.o.d
$ nvcc -o test.cc.o -x cu -c test.cc
$ head -3 test.cc.o.d
test.cc.o: test.cc /usr/include/stdc-predef.h /usr/include/cuda_runtime.h \
 /usr/include/host_config.h /usr/include/features.h \
 /usr/include/x86_64-linux-gnu/sys/cdefs.h \

@bartlettroscoe
Copy link
Member

I think the next action is to have someone update nvcc_wrapper to do what @bradking suggests in the above comment. If no one gets to that, I was going to give that a try. But I am getting hammered to get some other things done so I have not been able to get this this yet. I will add an explicit set of tasks for this for now.

@bathmatt
Copy link
Contributor

I guess I don't understand what needs to be done. Are you saying you just
need to change the behaviour to run nvcc twice, once to create the .d file,
and once to create the obj file/

$ nvcc -o /dev/null -x cu -E test.cc -Xcompiler
-MD,-MT,test.cc.o,-MF,test.cc.o.d
$ nvcc -o test.cc.o -x cu -c test.cc

On Thu, May 12, 2016 at 12:50 PM, Roscoe A. Bartlett <
[email protected]> wrote:

I think the next action is to have someone update nvcc_wrapper to do what
@bradking https://github.com/bradking suggests in the above comment
#321 (comment).
If no one gets to that, I was going to give that a try. But I am getting
hammered to get some other things done so I have not been able to get this
this yet. I will add an explicit set of tasks for this for now.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#321 (comment)

@bartlettroscoe
Copy link
Member

Are you saying you just need to change the behaviour to run nvcc twice, once to create the .d file, and once to create the obj file/

Yes, I think that is what @bradking is suggesting. We just need to have someone see if this works with nvcc_wrapper. It will be a little while before I can get set up to try this myself to experiment with this.

@bartlettroscoe
Copy link
Member

@bradking, now that SNL has a contract with Kitware to help support Trilinos at SNL, do you think you might look into updating the nvcc_wrapper to address problem with Ninja dependencies as per your above comment?

Otherwise, I will wait until we have Ninja Fortran support integrated into CMake 'master' and a more standard way to install Fortran-enabled Ninja executables.

@bradking
Copy link
Contributor Author

@bartlettroscoe see kokkos/nvcc_wrapper#5 for a proposed fix to the nvcc_wrapper script for this.

@bartlettroscoe
Copy link
Member

see kokkos/nvcc_wrapper#5 for a proposed fix to the nvcc_wrapper script for this.

@bradking, thanks for putting this together.

@bathmatt, do you have a current Trilinos configure script for one of the ATTB CUDA machines set up so that we can try this patch provided by Brad?

@bartlettroscoe
Copy link
Member

@bathmatt, do you have a current Trilinos configure script for one of the ATTB CUDA machines set up so that we can try this patch provided by Brad?

@bathmatt, do you have a current Trilinos configure script for one of the ATTB CUDA machines set up so that we can try this patch provided by Brad in kokkos/nvcc_wrapper#5? If you do, then Joe F. or I can give it a try on hansen or shiller.

@crtrott, do you have any objection to the changes in kokkos/nvcc_wrapper#5? After we are able to test this with Ninja, then is this something that you would accept?

@mhoemmen mhoemmen added the Framework tasks Framework tasks (used internally by Framework team) label Nov 29, 2016
@bartlettroscoe
Copy link
Member

@bathmatt recently mentioned that this is still a problem on machines using nvcc and ninja. He states this is a significant productivity issue for those builds. The PR kokkos/nvcc_wrapper#5 now has merge conflicts so we will need to update nvcc_wrapper again in order to retry these.

I think the best time to address this might be when we get a standard ATDM configuration of Trilinos running (which will be happening soon). When we get on a machine with a CUDA build using nvcc_wrapper we will try out ninja and revisit this patch in kokkos/nvcc_wrapper#5 and work to get it merged into the offical source repo for nvcc_wrapper. Once we get to that point, I will create a new issue in https://gitlab.kitware.com/snl/project-1/.

@bradking
Copy link
Contributor Author

kokkos/nvcc_wrapper#5 has been merged.

@github-actions
Copy link

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and remove the MARKED_FOR_CLOSURE label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.

@github-actions github-actions bot added the MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. label Dec 15, 2020
@github-actions
Copy link

This issue was closed due to inactivity for 395 days.

@github-actions github-actions bot added the CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. label Jan 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. Framework tasks Framework tasks (used internally by Framework team) MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot.
Projects
None yet
Development

No branches or pull requests

5 participants