-
Notifications
You must be signed in to change notification settings - Fork 7
h100 timings for benzene
Eric Bylaska edited this page May 22, 2024
·
5 revisions
- This example is FFT dominant.
Date: 5/20/24
Directory: /home/bylaska/PWDFT3/QA/benzene
The table contains performance timings for the computational task on the given machine with varying numbers of CPU cores (ncpus). The timings are presented in seconds (cputime) and are broken down into different components:
- non-local: Timings for non-local operations.
- ffm: Timings for ffm operations.
- fmf: Timings for fmf operations.
- fft: Timings for FFT (Fast Fourier Transform) operations.
- diagonalize: Timings for diagonalize operations.
"In the h100 binary, FFT operations use the CUDA FFTs, and the BLAS3 operations are also executed on the GPU. Additionally, it's important to note that the GPUs become overloaded after reaching a threshold of ncpus=8."
Directory: /home/bylaska/PWDFT3/QA/benzene mpirun -np 1 ../../build_cuda/pwdft beznene.nw
machine | nodes | ncpus | cputime | non-local | ffm | fmf | fft | diagonalize |
---|---|---|---|---|---|---|---|---|
h100 | 1 | 1 | 7.695e-01 | 2.798e-02 | 7.686e-03 | 1.898e-02 | 4.873e-01 | 1.219e-05 |
h100 | 1 | 2 | 4.281e-01 | 1.267e-02 | 5.474e-03 | 1.165e-02 | 2.955e-01 | 1.499e-05 |
h100 | 1 | 3 | 5.281e-01 | 1.891e-02 | 1.643e-02 | 1.188e-02 | 6.766e-01 | 1.563e-05 |
h100 | 1 | 4 | 4.253e-01 | 1.085e-02 | 9.099e-03 | 7.906e-03 | 3.945e-01 | 3.114e-06 |
h100 | 1 | 5 | 5.054e-01 | 1.179e-02 | 1.573e-02 | 7.999e-03 | 5.701e-01 | 1.559e-05 |
h100 | 1 | 6 | 2.403e-01 | 6.199e-03 | 5.031e-03 | 7.052e-03 | 1.826e-01 | 1.430e-05 |
h100 | 1 | 7 | 2.406e-01 | 6.280e-03 | 5.504e-03 | 7.886e-03 | 1.876e-01 | 1.462e-05 |
h100 | 1 | 8 | 2.491e-01 | 6.597e-03 | 6.282e-03 | 9.582e-03 | 1.967e-01 | 1.477e-05 |
h100 | 1 | 16 | 3.988e-01 | 8.835e-03 | 1.253e-02 | 1.827e-02 | 3.416e-01 | 1.481e-05 |
h100 | 1 | 32 | 7.752e-01 | 2.029e-02 | 2.574e-02 | 3.472e-02 | 6.813e-01 | 1.484e-05 |