Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'#line 1' from flex generated apr_scanner.cpp SEACAS file causing unnecessary rebuilds when using Ninja + CUDA-8 #2359

Closed
bartlettroscoe opened this issue Mar 9, 2018 · 5 comments
Labels
client: ATDM Any issue primarily impacting the ATDM project PA: Data Services Issues that fall under the Trilinos Data Services Product Area pkg: seacas

Comments

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Mar 9, 2018

CC: @trilinos/seacas, @fryeguy52

Description

The following line in the SEACAS file:

packages/seacas/libraries/aprepro_lib/apr_scanner.cc:974:#line 1 "/scratch/gdsjaar/seacas/packages/seacas/libraries/aprepro_lib/aprepro.ll"

is causing the ninja CUDA build for Trilinos with SEACAS to rebuild the same files every time, even when everything should be up-to-date.

The cause for this is very strange. There is a defect with the nvcc program (at least CUDA 8.0.61) when running nvcc -M -Wno-deprecated-gpu-targets on a file that has #line 1 <some-file> in the file results in nvcc incorrectly spitting out a dependency on <some-file>. This results in nvcc when run under nvcc_wrapper run under ninja to produce a dependency on the file <some-file>. Apparently this does not happen for any number other than 1. For example, lines with #line 2 <some-file> do not cause nvcc to print out a bad dependency.

Kitware reported this NVIDIA as bug number 2080680 just today. (However, their issue tracker is not public).

The workaround for this is to make the single one-line change:

diff --git a/packages/seacas/libraries/aprepro_lib/apr_scanner.cc b/packages/seacas/libraries/aprepro_lib/apr_scanner.cc
index 747470a..a2512cd 100644
--- a/packages/seacas/libraries/aprepro_lib/apr_scanner.cc
+++ b/packages/seacas/libraries/aprepro_lib/apr_scanner.cc
@@ -971,7 +971,6 @@ static yyconst flex_int16_t yy_rule_linenum[102] = {
 #define yymore() yymore_used_but_not_detected
 #define YY_MORE_ADJ 0
 #define YY_RESTORE_YY_MORE_OFFSET
-#line 1 "/scratch/gdsjaar/seacas/packages/seacas/libraries/aprepro_lib/aprepro.ll"
 /* -*- Mode: c++ -*- */
 #line 39 "/scratch/gdsjaar/seacas/packages/seacas/libraries/aprepro_lib/aprepro.ll"

Also, changing #line 1 <file-name> to #line 2 <file-name> also fixes the problem.

Steps to Reproduce

This can be produced, for example, with one of the ATDM Trilinos CUDA configurations described at:

For example, just log onto hansen or shiller or white or ride, clone Trilinos somewhere, and then configure and build with:

 cd <some_build_dir>/

$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-opt

$ cmake \
  -GNinja \
  -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
  -DTrilinos_ENABLE_Panzer=ON -DPanzer_ENABLE_TESTS=ON \
  $TRILINOS_DIR

$ ninja -j16  # Builds everything the first time

$ ninja -j16  # Rebuilds a bunch of targets!

You can see what is triggering the rebuild using:

$ ninja -d explain
...
ninja explain: output /scratch/gdsjaar/seacas/packages/seacas/libraries/aprepro_lib/aprepro.ll of phony edge with no inputs doesn't exist
ninja explain: /scratch/gdsjaar/seacas/packages/seacas/libraries/aprepro_lib/aprepro.ll is dirty
ninja explain: packages/seacas/libraries/aprepro_lib/CMakeFiles/aprepro_lib.dir/apr_scanner.cc.o is dirty
...
$ ninja -t deps packages/seacas/libraries/aprepro_lib/CMakeFiles/aprepro_lib.dir/apr_scanner.cc.o
...
    /scratch/gdsjaar/seacas/packages/seacas/libraries/aprepro_lib/aprepro.ll
...

See, that file /scratch/gdsjaar/seacas/packages/seacas/libraries/aprepro_lib/aprepro.ll is not on this system obviously.

Then to fix, just apply the patch shown above, rebuild once and then rebuild again, and you will get:

$ time ninja
ninja: no work to do.

real    0m2.744s
user    0m0.275s
sys     0m0.218s

(Yes, that is ninja evaluating all of the targets to figure out if it needs to build any of the Panzer tests in under 3 sec! That is why we want to be able to use ninja.)

@bartlettroscoe
Copy link
Member Author

I created the issue sandialabs/seacas#97 to see if we can avoid #line ` from getting into this or other such SEACAS files.

I guess if nothing else, we could just replace #line 1 <some-file> with #line <any-number-other-than-1> <some-file> every time that we update the snapshot of SEACAS.

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 9, 2018
This really needs to be fixed in SEACAS proper but this is a quick workaround
that will get us going with Nina + CUDA for ATDM builds of Trilinos.
@bartlettroscoe
Copy link
Member Author

I created PR #2360 to get this addressed in the short-term.

bartlettroscoe added a commit to bartlettroscoe/seacas that referenced this issue Mar 10, 2018
This remove the line starting wtih '#line 1' which causes nvcc to create a
wrong dependency on apr_scanner.o based on the non-existant file
/scratch/gdsjaar/seacas/packages/seacas/libraries/aprepro_lib/aprepro.ll.  See
trilinos/Trilinos#2359.

This is the same patch as the Trilinos patch in PR trilinos/Trilinos#2360.  By
making this patch here as well, the next time SEACAS is snapshotted into
Trilinos, it will not overwrite the patch in trilinos/Trilinos#2360 being
committed to Trilinos.

But the right solution will be to avoid lines starting with '#line 1' in the
future when generating these files.
@bartlettroscoe bartlettroscoe changed the title '#line 1' from flex generated apr_scanner.cpp SEACAS file causing unnecessary rebuilds when using Ninja + CUDA '#line 1' from flex generated apr_scanner.cpp SEACAS file causing unnecessary rebuilds when using Ninja + CUDA-8 Mar 12, 2018
gsjaardema added a commit that referenced this issue Mar 12, 2018
…workaround

Remove '#line 1' as workaround for Ninja + CUDA rebuilds (#2359)
@bartlettroscoe
Copy link
Member Author

@gsjaardema just merged in the PR #2360 and updated the generation code in SEACAS to avoid putting in any #line <num> lines with flex (see sandialabs/seacas#97 and sandialabs/seacas@0020417).

Therefore, this issue should now be resolved. I am just going to do a quick check to make sure, post the result here with full reproducability instructions, and then close if the rebuilds are avoided.

@bartlettroscoe
Copy link
Member Author

On 'ride' for the Trilinos version:

commit 6b05d3ed03bbf6ef03e47d2ab7384589a029e081
Merge: f18f1d2 b43d246
Author: Greg Sjaardema <[email protected]>
Date:   Mon Mar 12 09:19:00 2018 -0600

    Merge pull request #2360 from bartlettroscoe/2359-fix-ninja-rebuilds-workaround
    
    Remove '#line 1' as workaround for Ninja + CUDA rebuilds (#2359)

I did the configure and build of Panzer for the cuda-opt build withwith:

$ bsub -x -I -q rhel7F ./checkin-test-atdm.sh cuda-opt --enable-packages=Panzer --configure --build
***Forced exclusive execution
Job <47745> is submitted to queue <rhel7F>.
<<Waiting for dispatch ...>>
<<Starting on ride10>>

...

Mon Mar 12 10:06:13 MDT 2018

Enabled Packages: Panzer

Build test results:
-------------------
0) MPI_RELEASE_DEBUG_SHARED_PT => Test case MPI_RELEASE_DEBUG_SHARED_PT was not run! => Does not affect push readiness! (-1.00 min)
1) cuda-opt => passed: build-only passed => Not ready to push! (4.30 min)

REQUESTED ACTIONS: PASSED

Then I did a rebuild with:

$ cd cuda-opt/ 

$ source load-env.sh 
Hostname 'ride6' matches known ATDM host 'ride' and system 'ride'
ATDM_CONFIG_TRILNOS_DIR = /home/rabartl/Trilinos.base/Trilinos
Setting default compiler and build options for JOB_NAME='cuda-opt'
Using white/ride compiler stack CUDA to build RELEASE code with Kokkos node type CUDA

$ time ninja -j16
ninja: no work to do.

real    0m0.339s
user    0m0.266s
sys     0m0.069s

That did not rebuild anything.

But to make sure that rebuilds would work I did:

$ touch ~/Trilinos.base/Trilinos/packages/seacas/libraries/aprepro_lib/apr_aprepro.cc

$ time ninja -j16
[60/60] Linking CXX executable packages/panzer/adapters-stk/test/solver/PanzerAdaptersSTK_solver.exe

real    0m44.655s
user    5m9.600s
sys     1m8.888s

$ time ninja -j16
ninja: no work to do.

real    0m2.796s
user    0m0.255s
sys     0m0.272s

This looks good.

Closing as complete.

Thanks @gsjaardema!

ndellingwood added a commit to ndellingwood/Trilinos that referenced this issue Mar 12, 2018
* develop: (261 commits)
  Replace PHX_EVALUATOR_CLASS Macros (trilinos#2322)
  Change time-limit on 'ride' from 4 to 12 hours (TRIL-171)
  Disable Panzer examles for CUDA builds on 'ride' (trilinos#2318, TRIL-171)
  Move ATDMDevEnvUtils.cmake to utils/ subdir (TRIL-171)
  Change from ATDM_CONFIG_USE_MAKEIFLES to ATDM_CONFIG_USE_NINJA (TRIL-171)
  Add ATDM_CONFIG_KNOWN_HOSTNAME var for CTEST_SITE (TRIL-171)
  Misc improvements to checkin-test-atdm.sh (TRIL-171)
  Print the return code from 'ctest -S' command (TRIL-171)
  Tpetra: Sort and merge specializations for Tpetra::CrsGraph (trilinos#2354)
  Remove '#line 1' as workaround for Ninja + CUDA rebuilds (trilinos#2359)
  Added function to return the full vector of data in the indexer rather than elem by elem (trilinos#2355)
  Panzer: Fixing rotation matrix calculation in integration values
  Tpetra: Adding an additive Schwarz test
  checkin-test driver scripts for local testing ATDM builds of Trilinos
  Add list of all supported builds on ride (TRIL-171)
  Add list of all supported builds shiller (TRIL-171)
  Fix location of nvcc_wrapper and other fixups (TRIL-171)
  Improve the 'date' output and start and end (TRIL-171)
  Change name ATDM_HOSTNAME to ATDM_SYSTEM_NAME (TRIL-171)
  Print KOKKOS_ARCH to STDOUT (trilinos#1400)
  ...

Conflicts:
	packages/shylu/shylu_node/tacho/src/TachoExp_NumericTools.hpp
	packages/shylu/shylu_node/tacho/unit-test/Tacho_TestCrsMatrixBase.hpp
@bartlettroscoe
Copy link
Member Author

Closing as complete.

@bartlettroscoe bartlettroscoe added the PA: Data Services Issues that fall under the Trilinos Data Services Product Area label Nov 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client: ATDM Any issue primarily impacting the ATDM project PA: Data Services Issues that fall under the Trilinos Data Services Product Area pkg: seacas
Projects
None yet
Development

No branches or pull requests

1 participant