Skip to content

h100 timings for benzene

Eric Bylaska edited this page May 22, 2024 · 5 revisions

5/20/24 - Benzene Benchmark

  • This example is FFT dominant.

Date: 5/20/24

Directory: /home/bylaska/PWDFT3/QA/benzene

The table contains performance timings for the computational task on the given machine with varying numbers of CPU cores (ncpus). The timings are presented in seconds (cputime) and are broken down into different components:

  • non-local: Timings for non-local operations.
  • ffm: Timings for ffm operations.
  • fmf: Timings for fmf operations.
  • fft: Timings for FFT (Fast Fourier Transform) operations.
  • diagonalize: Timings for diagonalize operations.

"In the h100 binary, FFT operations use the CUDA FFTs, and the BLAS3 operations are also executed on the GPU. Additionally, it's important to note that the GPUs become overloaded after reaching a threshold of ncpus=8."

Directory: /home/bylaska/PWDFT3/QA/benzene mpirun -np 1 ../../build_cuda/pwdft beznene.nw

GPU Timings Table

machine nodes ncpus cputime non-local ffm fmf fft diagonalize
h100 1 1 7.695e-01 2.798e-02 7.686e-03 1.898e-02 4.873e-01 1.219e-05
h100 1 2 4.281e-01 1.267e-02 5.474e-03 1.165e-02 2.955e-01 1.499e-05
h100 1 3 5.281e-01 1.891e-02 1.643e-02 1.188e-02 6.766e-01 1.563e-05
h100 1 4 4.253e-01 1.085e-02 9.099e-03 7.906e-03 3.945e-01 3.114e-06
h100 1 5 5.054e-01 1.179e-02 1.573e-02 7.999e-03 5.701e-01 1.559e-05
h100 1 6 2.403e-01 6.199e-03 5.031e-03 7.052e-03 1.826e-01 1.430e-05
h100 1 7 2.406e-01 6.280e-03 5.504e-03 7.886e-03 1.876e-01 1.462e-05
h100 1 8 2.491e-01 6.597e-03 6.282e-03 9.582e-03 1.967e-01 1.477e-05
h100 1 16 3.988e-01 8.835e-03 1.253e-02 1.827e-02 3.416e-01 1.481e-05
h100 1 32 7.752e-01 2.029e-02 2.574e-02 3.472e-02 6.813e-01 1.484e-05