Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taylorGreenVortex_p3 is unstable on Trilinos develop #858

Closed
PaulMullowney opened this issue Apr 30, 2021 · 20 comments
Closed

taylorGreenVortex_p3 is unstable on Trilinos develop #858

PaulMullowney opened this issue Apr 30, 2021 · 20 comments

Comments

@PaulMullowney
Copy link
Contributor

The taylorGreeVortex_p3 regression test is blowing up on GPUs (Eagle) on Trilinos develop. This test uses all Trilinos/Tpetra algorithms and I think it is matrix free. I've tried 2 versions of Trilinos, develop and master, current as of 4/30/2021. Master works fine. The first iteration seems ok. however the second shows in stability. Ultimately the norms grow out of control.

1/1 Equation System Iteration
dpdx 6 5.22243e-07 0.00112119 1
pressure 9 2.23513e-05 0.0597388 1
dpdx 7 9.86341e-08 0.000835196 0.744917
1/2 myLowMach
velocity 200 0.0495843 0.00127094 1
pressure 12 0.000334666 1.45767 24.4007
dpdx 9 1.02156e-06 0.0131696 11.746

@jrood-nrel
Copy link
Contributor

@alanw0 @jhux2 @rcknaus Would any of you be able to help with this?

@rcknaus
Copy link
Contributor

rcknaus commented Apr 30, 2021

I'll take a look on our ascicgpu systems to see if it happens there too.

@alanw0
Copy link
Contributor

alanw0 commented Apr 30, 2021

Tagging @tasmith4. I heard there was a big Tpetra/UVM commit very recently. It was mentioned in a trilinos issue for Stefan's version of nalu. Perhaps the checking that @rcknaus is doing will show whether that's relevant here.

@rcknaus
Copy link
Contributor

rcknaus commented Apr 30, 2021

Ran with cuda-11/gcc7.2, 474975f trilinos commit and 176f3c3 nalu-wind commit without seeing this issue

1/1    Equation System Iteration
        dpdx                   5       5.55982e-09    0.000172415             1
        pressure               8        5.9153e-10     2.5897e-06             1
        dpdx                   5       2.70065e-12    4.66969e-08    0.00027084
 1/2      myLowMach
        velocity              17        5.1624e-08    7.69845e-05             1
        pressure              11       2.24153e-09    4.99644e-06       1.92935
        dpdx                   5       3.84463e-11    7.75728e-07    0.00449919
 2/2      myLowMach
        velocity              18       1.03971e-09    1.81834e-06     0.0236196
        pressure              11       1.32057e-09    4.64508e-06       1.79367
        dpdx                   5       4.21902e-11    8.00755e-07    0.00464434

taylorGreenVortex_p3 momentum is different from the other test by using belos's block GMRES. That would be my main suspicion, to try switching to pseudoblock gmres/bicgstab instead.

Is UVM off on the eagle trilinos configuration?

@PaulMullowney
Copy link
Contributor Author

PaulMullowney commented Apr 30, 2021

474975f still blows up for me. I was on 5a2c077 when I first tested. Looking at the Trilinos CMakeCache.txt, it appears that UVM is on for all the relevant variables:
TpetraCore_ENABLE_CUDA_UVM:BOOL=ON
KokkosKernels_INST_MEMSPACE_CUDAUVMSPACE:BOOL=ON
Kokkos_ENABLE_CUDA_UVM:BOOL=ON

I did build with Cuda 10.2.89.

@rcknaus
Copy link
Contributor

rcknaus commented Apr 30, 2021

I switched to cuda 10.2.89 and was able to reproduce the issue locally,

1/1    Equation System Iteration
        dpdx                   6       5.22243e-07     0.00112119             1
        pressure               9       2.23556e-05      0.0597388             1
        dpdx                   7         1.047e-07    0.000847884      0.756233
 1/2      myLowMach
        velocity             200         0.0504052     0.00127447             1
        pressure              12       0.000337934        1.47776       24.7371
        dpdx                   9       1.03772e-06      0.0133597       11.9156
 2/2      myLowMach
        velocity             200           10499.8        45.7795       35920.4
        pressure              11         0.0129725        31.9624       535.037
        dpdx                   8        0.00011037       0.573632       511.626

Not sure exactly what's going wrong, but I'll take a look at it monday.

@overfelt
Copy link
Contributor

overfelt commented May 3, 2021

I do not see this issue on Summit running cuda 10.2.89 and gcc 7.4.0 with trilinos develop 1487be4 which is up to date as of 5/3/21.

@rcknaus
Copy link
Contributor

rcknaus commented May 3, 2021

Seems like the issue is with the multivector doExport producing different results before/after the recent Tpetra change Alan mentioned. Might be a memory error, since doExport gets called a lot in the same manner in the other two matrix free GPU regression tests that aren't failing, and the problem doesn't appear on different platforms/build configurations.

@tasmith4
Copy link
Contributor

tasmith4 commented May 3, 2021

FYI @kddevin

@jhux2
Copy link
Contributor

jhux2 commented May 4, 2021

@rcknaus Can you post a cmake configure recipe and the module environment for the passing/not passing builds?

@rcknaus
Copy link
Contributor

rcknaus commented May 4, 2021

@jhux2 ascic-build-env.tar.gz on ascicgpu24.

I tested with trilinos/Trilinos@cbeb75a with cuda-10 passed and trilinos/Trilinos@474975f with cuda-10 failed but passed with cuda-11.

@rcknaus
Copy link
Contributor

rcknaus commented May 5, 2021

Looking at it a bit more, rather than being a cuda-10 vs cuda-11 thing, TPETRA_ASSUME_CUDA_AWARE_MPI=1 passes while TPETRA_ASSUME_CUDA_AWARE_MPI=0 fails with trilinos/Trilinos@474975f irrespective of cuda version---just my cuda-11 build had the that variable set to 1.

@jhux2
Copy link
Contributor

jhux2 commented May 6, 2021

@rcknaus Thanks for your leg work.

I ran the Tpetra unit test suite on ascicgpu031. If TPETRA_ASSUME_CUDA_AWARE_MPI=1, all Tpetra unit tests pass. If TPETRA_ASSUME_CUDA_AWARE_MPI is undefined, thenTpetraCore_CrsMatrix_Bug8794_MPI_4 fails. This test was implemented to cover issue trilinos/Trilinos#8794.

@rcknaus How did you narrow it down to doExport -- git bisection, something else? I'm just trying to figure out how this might be reproduced outside Nalu-Wind. It may be that the current unit tests don't cover the case that makes doExport fail.

@kddevin
Copy link

kddevin commented May 6, 2021

I can look at TpetraCore_CrsMatrix_Bug8794_MPI_4; Kyungjoo was having some problems with that test as well. That test exercises dense matrix rows more so than other Tpetra tests. I wouldn't expect doExport to care about the density, but perhaps it does. Does the Nalu test have dense rows?

@rcknaus
Copy link
Contributor

rcknaus commented May 6, 2021

@jhux2 nothing special. The residual computation for the gradient is wrong in the TPETRA_ASSUME_CUDA_AWARE_MPI=0 case, so I just dumped the residual multivector for the local computation on the owned+shared domain (the source vector for the export) and the resulting residual on the owned after the export add operation. The source vector is the same in both the working/not-working configurations but the result of the doExport is very different for some entries.

dpdx.tar.gz

This is the version with TPETRA_ASSUME_CUDA_AWARE_MPI=1. I also have the files dumped with TPETRA_ASSUME_CUDA_AWARE_MPI=0 if you want those as well.

@jhux2
Copy link
Contributor

jhux2 commented May 6, 2021

@rcknaus Ok, this is good data to have. I'll see if I can craft a stand-alone test from it.

@kddevin
Copy link

kddevin commented May 7, 2021

@jhux2 The failing Tpetra test does appear to be giving different results after doImport depending on TPETRA_ASSUME_CUDA_AWARE_MPI. I am tracking it today.

@kddevin
Copy link

kddevin commented May 7, 2021

@rcknaus I pushed a fix in trilinos/Trilinos#9117
It would be great if you could try it in Nalu.
Thank you for reporting the problem and narrowing it down to doExport.

@rcknaus
Copy link
Contributor

rcknaus commented May 7, 2021

@kddevin I tested trilinos/Trilinos#9117 with nalu-wind and confirmed that taylorGreenVortex_p3 runs successfully irrespective of TPETRA_ASSUME_CUDA_AWARE_MPI with that change. Thank you for looking into this.

@PaulMullowney
Copy link
Contributor Author

This appears to be resolved as the test seems to be stable in our nightly builds. Thanks for everyone's help tracking this down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants