-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] OOM on memory pool expansion #496
Comments
Let me take a look. |
So, there were a few problems here, but I learned something: The reason for the OOM on that pool growth is also unknown -- the logic for growing the pool was like this:
This was falling into the
This will make a difference, for example if the current pool is 16GiB and you try to allocate 9GiB and it doesn't fit, it will grow the pool by 9GiB rather than by ~16GiB. That difference of 7GiB could be the difference of another program on the machine (or a library in your app that doesn't use RMM) running or not, so I think being less greedy here is better. PR on the way. There were also some issues with passing default parameters from Python (and still are). |
Describe the bug
When the new memory pool expands, it appears it may be expanding by too much. As a result, one can run into an OOM error. In this case the GPU has
32510MiB
available. Though allocating 2x 8GB allocations result in OOM on the second allocation.Steps/Code to reproduce bug
Expected behavior
The pool would expand such that the next allocation would fit without causing an OOM error.
Environment details (please complete the following information):
rmm/print_env.sh
script to gather relevant environment detailsAdditional context
We seem to be running into some form of this in UCX-Py's benchmarking tests of late. This cropped up after PR ( #466 ), which resulted in Python now using the new pool memory allocator instead of CNMeM.
The text was updated successfully, but these errors were encountered: