Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How could I use "memoptest" to do UPI stress and make the total percentage of UPI bandwidth up to 100% ? #64

Closed
WisQTuser opened this issue Mar 14, 2018 · 26 comments
Labels

Comments

@WisQTuser
Copy link

Hi Developers,
I'd like to use PCM Tools to monitor UPI traffic and execute UPI stress on my server. There are two processors (Intel Xeon Gold 5115 2.4GHz) and 16 x 8 GB DDR4 2400 on my server.
I use ./pcm.x to launch the UPI monitor, and use "numactl --cpunodebind=0 --membind=1 ./memoptest 2" "numactl --cpunodebind=1 --membind=0 ./memoptest 2 to run memory bandwidth test.
memoptest

 I have some questions that need your help.

Q1. When I use this two commands to run memory bandwidth test, the pcm.x show some data numbers in UPI0 and UPI1 columns. What's the meaning about 13G and 55% ?
Q2. If the 55% is the total Percentage of UPI bandwidth, how could I do to make the UPI bandwidth up to 100% ?
stress

Thanks.

@opcm
Copy link
Contributor

opcm commented Mar 15, 2018

memoptest is single threaded. A single thread can not consume the whole bandwidth capacity. Please try to run many memoptest processes in parallel to get close to 100% utilization.

@opcm opcm added the question label Mar 15, 2018
@WisQTuser
Copy link
Author

WisQTuser commented Mar 19, 2018

Hi Developers,
I try to run many memoptest process in parallel, but the result is the same as before. Total percentage of UPI bandwidth still can't get close to 100%. The bandwidth of streaming to memory will be divided equally. please refer the screenshot result as below.

four windows

six windows

@opcm
Copy link
Contributor

opcm commented Mar 19, 2018

Hi, could you please try to use many memoptest instances but with read-only traffic to drive the utilization up? This will be option "0" instead of "2".

It is also important to disable compiler optimizations with "-O0" option:

g++ -O0 -std=c++11 memoptest.cpp -o memoptest

@WisQTuser
Copy link
Author

Hi Developers,
I follow your suggestions using many memoptest instances with read-only traffic and disable compiler optimizations. The total UPI bandwidth percentage is up to about 85% but still can't get to 100%. If I keep opening a new window to run memoptest with option "0", the reading memory bandwidth will drop down to 17XX MB/s as the below picture . Is there anything that I can do ? Please refer the picture of test result. Thanks.
intel upi stress up to 85

@opcm
Copy link
Contributor

opcm commented Mar 20, 2018

Can you try to use a mix of read and write traffic? There is also a specialized tool that can trigger different traffic patterns: https://software.intel.com/en-us/articles/intelr-memory-latency-checker
Options are --bandwidth_matrix and -Wn, where n is the type of traffic

@WisQTuser
Copy link
Author

WisQTuser commented Mar 26, 2018

Hi Developers,
I try to use Intel memory latency checker to trigger different traffic patterns, and use options --bandwidth_matrix -Wn, where n is 2. But the total UPI bandwidth percentage can't always stay in high performance. Once the Numa mode number appears, the UPI bandwidth percentage will rise to 5X%. After two seconds, the UPI bandwidth percentage will drop to a lower value(0%). How should I do to keep the UPI bandwidth on the highest percentage like the memptest?

2

1

@opcm
Copy link
Contributor

opcm commented Mar 26, 2018

could you please increase the test phase duration? I believe this is -t option

@WisQTuser
Copy link
Author

Hi Developers,
I increase the test phase duration -t, but the result is the same. Still can't keep the UPI bandwidth on the highest percentage.
t

@WisQTuser
Copy link
Author

Hi developers,
Should I try another methods or command that can trigger the UPI traffic to 100%?

@opcm
Copy link
Contributor

opcm commented Apr 9, 2018

the parameter value you have chosen is in seconds. According to the screen shot it still runs local memory test. Did it ever finish? Please choose a smaller value (e.g. 16 seconds (per matrix element))

@WisQTuser
Copy link
Author

Hi developers,
I use a smaller value to run the test. But the situation is the same as before. Still can't trigger the UPI traffic to 100%. The stress test only run the specific matrix will trigger the bandwidth higher, maybe about 90%. The whole test will be finished in about five minutes. I want the stress can be keep the UPI bandwidth on 100% overnight or even longer.
mlc_1
mlc_2
mlc_3

@opcm
Copy link
Contributor

opcm commented Apr 12, 2018

I could get 96% utilization with these parameters:

--loaded_latency -omlc_2s_10c_ro-remote.cfg -d0 -t1000

with this config file (mlc_2s_10c_ro-remote.cfg)

0-9 R seq 300000 dram 1
28-37 R seq 300000 dram 0

I guess on your 10 core CPU you need to change it to

0-9 R seq 300000 dram 1
10-19 R seq 300000 dram 0

you can increase -t parameter as you want to run it longer

@WisQTuser
Copy link
Author

Hi developers,
I follow your instruction to run the test, only get about 60% utilization.
Is there anything wrong with my command or config file?
mlc_4

@opcm
Copy link
Contributor

opcm commented Apr 13, 2018

could you please run mlc without parameters and post the mlc output here as text? (just want to see if you platform configuration is healthy)

@opcm
Copy link
Contributor

opcm commented Apr 13, 2018

/proc/cpuinfo is also interesting to check

@WisQTuser
Copy link
Author

Hi Developers,
I post the mlc output and cpuinfo as text.
Please refer it,
Thanks.
output.txt
cpuinfo.txt

@opcm
Copy link
Contributor

opcm commented Apr 16, 2018

You have a very weird OS processor -> socket topology (round robin). I did not expect that. Here is a fixed configuration file:

0,2,4,6,8,10,12,14,16,18 R seq 300000 dram 1
1,3,5,7,9,11,13,15,17,19 R seq 300000 dram 0

@WisQTuser
Copy link
Author

Hi Developers,
After I use the fixed configuration file, I could get about 90% utilization. May I ask what's the meaning about the configuration file and the command "--loaded_latency -omlc_2s_10c_ro-remote.cfg -d0 -t1000? Could I change some value that can get more utilization? Thanks for your support again.
90

@opcm
Copy link
Contributor

opcm commented Apr 17, 2018

the options and configuration file format are described the readme file in the mlc package. You might try other traffic types and/or random traversal patterns.

@WisQTuser
Copy link
Author

WisQTuser commented Apr 18, 2018

Ok, I will check the readme file, Thanks. Another question about pcm.x is the bandwidth utilization shows 90%, UPI0 and UPI2 show 21G. There are two Intel Skylake CPUs on our server, and peak bandwidth is 10.4 GT/s * 2channels * 2Bytes/channel=41.6GB/s (peak), 20.8GB/s per channel. How could I get the 21G of 90% data number on pcm.x ?

@opcm
Copy link
Contributor

opcm commented Apr 20, 2018

the formula you are using gives a somewhat pessimistic estimation of Intel UPI max throughput. Intel UPI may achieve a better packing of data into packets. PCM uses a more optimistic estimation of max throughput assuming good data packing.

@WisQTuser
Copy link
Author

So if now I have to test with the max throughout of Intel UPI, which standard can I refer to that I could judge the result is passed or failed? Now I can use the command to trigger the total UPI bandwidth to 93%, I just want to know if the 93% of UPI bandwidth reach the standard or not.

@opcm
Copy link
Contributor

opcm commented Apr 25, 2018

as far as I know there is no standard that defines the theoretical maximum. It is workload dependent.

@WisQTuser
Copy link
Author

Thanks for your answer. So you means the percentage of UPI bandwidth can be up to 100% if increasing the workload, how can I increase the workload if I can change hardware configuration or another traffic types?

@WisQTuser
Copy link
Author

Hi Developers,
You means the percentage of UPI bandwidth can be up to 100% if increasing the workload, how can I increase the workload if I can change hardware configuration or another traffic types?

@opcm
Copy link
Contributor

opcm commented Feb 6, 2019

I never managed to drive the utilization to 100%. I think a specific synthetic test is required but I don't know how to implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants