-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bizarre runtime regression of PyString when iterating over dict of size >= 2^16 #3299
Comments
I suspect that this due to the dict items being registered in the thread-local owned object pool here so that our iterator can return bare references. You should be able to check this if you iterate the dictionary using e.g. Python code like py_run!(py, value, "collections.deque(value, maxlen=0)"); instead of using our iterator. |
@adamreichold Thanks for the extra insight. I dug around (very briefly), and I'm still not clear on why the performance is a cliff, rather than slowly degrading w.r.t the pool size. My bigger issue is why the performance remains degraded, which doesn't seem to make sense even when the pool is reused across calls. Perhaps this would be more clear if I understood what was actually slowing down. One aside: I tweaked my example a bit, and noticed similarly that if I iterate over (2^16-1) (the previous success case), but then allocate a few more strings with |
The pool itself is oblivious to type of the objects, it is used only to decrement their reference counts after all. So indeed, (The performance cliff you are seeing might be related to the cache hierarchy of the CPU you are running the benchmark on. So maybe the |
@adamreichold Hrm, yeah this is interesting. I think you're right about the architecture here -- I have trouble getting such drastic performance results when running this in an x86 (cloud) instance -- although the effect is still there by a small margin. It seems the M1 arch here is the main factor If it's of any interest, I did try this again with a local patch, where I run |
I think this only redistributes the cost of dropping the allocations backing the pool. I think it would be best if we invest our efforts into removing this choke point entirely. |
@adamreichold Sounds reasonable, I appreciate the feedback |
Bug Description
Context: I am converting a function to pyo3 that reads through various types to create cache keys. Since the cache keys are performance critical, I was profiling different input values, and noticed that, after certain inputs, the performance of other inputs would change during runtime, and persistent for the lifetime of the program.
Specifically, the function returns in ~100ns when the input is a
None
, and will continue to behave this way when called again. If, however, a dict of size 2^16 is passed in and iterated over, the performance of the originalNone
value suddenly drops by about 1.5 orders of magnitude to -4.5us. The performance for this input never improves for the duration of the program (which is slower than a pure python version for this value).I was able to narrow this down to a minimally reproducible case, and found that the critical detail was whether
PyString::new
was called. RemovingPyString::new
caused the performance to behave consistently as expected. Alternatively, if thePyString::new
is left in, removing the dict iterator also caused the performance to behave consistently. Notably, for dicts of size 2^16-1 and below, this issue never happens (even when iterating and leaving the string in).I am not clear how any one function call can affect another in Pyo3 like this, so its difficult to debug further.
I see this behavior both when setting opt-level=0, and opt-level=3
Steps to Reproduce
Here is the minimal reproduction case:
Here is the test case:
None
None
again, noting the degraded performancefor _ in value { }
. Rebuild lib, repeat steps 1 - 3 above, noting the performance does not degradePyString::new
on line 11. Rebuild lib, repeat steps 1 - 3 above, noting the performance does not degradeScript below. Note that for dict of size less than 2^16, the behavior is not triggered.
Backtrace
No response
Your operating system and version
Macos M1 13.4.1
Your Python version (
python --version
)3.11.4
Your Rust version (
rustc --version
)rustc 1.70.0 (90c541806 2023-05-31)
Your PyO3 version
0.19.1
How did you install python? Did you use a virtualenv?
virtualenv + pyenv
maturin init
Additional Info
No response
The text was updated successfully, but these errors were encountered: