You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The CPU load during initialization is constantly at 100% for each postcli process. This has been reported from many users.
This issue is due to the specific Nvidia implementation of the OpenCL synchronization. The enqueued buffer read operation is constantly probing the status of the running OpenCL kernel, which puts the CPU under high load.
A workaround would be to put the CPU thread in sleep for a defined duration right after enqueuing the kernel, and only then enqueue a buffer read. The sleep duration can be obtained by averaging the execution time over a number of kernel executions and then subtracting a safety factor from it (e.g. sleep duration is 90% of kernel execution).
Ideally, the kernel execution time should be measured periodically by disabling the sleep for a couple of kernel executions every few seconds. This ensures that if the kernel execution time decreases below the sleep duration (e.g. becuase the user increased the power limit of the GPU), the decrease is properly detected.
Tests on a RTX 3080 Ti show, that a sleep duration of 25ms reduces the CPU load to 10% while maintaining the initialization speed. While a further increase of the sleep duration to 30ms reduced the CPU load to 2.5% it has a negative impact on the initialization speed.
sleep (ms)
CPU load (%)
init. speed (MiB/s)
0
100
3.30
25
10
3.30
30
2.5
2.75
The text was updated successfully, but these errors were encountered:
The CPU load during initialization is constantly at 100% for each postcli process. This has been reported from many users.
This issue is due to the specific Nvidia implementation of the OpenCL synchronization. The enqueued buffer read operation is constantly probing the status of the running OpenCL kernel, which puts the CPU under high load.
A workaround would be to put the CPU thread in sleep for a defined duration right after enqueuing the kernel, and only then enqueue a buffer read. The sleep duration can be obtained by averaging the execution time over a number of kernel executions and then subtracting a safety factor from it (e.g. sleep duration is 90% of kernel execution).
Ideally, the kernel execution time should be measured periodically by disabling the sleep for a couple of kernel executions every few seconds. This ensures that if the kernel execution time decreases below the sleep duration (e.g. becuase the user increased the power limit of the GPU), the decrease is properly detected.
Tests on a RTX 3080 Ti show, that a sleep duration of 25ms reduces the CPU load to 10% while maintaining the initialization speed. While a further increase of the sleep duration to 30ms reduced the CPU load to 2.5% it has a negative impact on the initialization speed.
The text was updated successfully, but these errors were encountered: