-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.3.0 PerformanceWarning: DataFrame is highly fragmented. #42477
Comments
Thanks for reporting this @xmatthias! Bisection indicates this was introduced in #38380 (appears to be intended, with warning now given instead of automatic consolidation, cc @jbrockmendel) |
Marking as a regression though since don't think a documented change |
Yes, this was intentinal.
This is a bug that should be fixed.
If the .copy bug is fixed, then you should be fine if you do all your inserts and then do .copy(). A better option would be to use pd.concat to do it all at once |
Thank you @Alex-ley for this wonderful example! |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Minimal sample
Problem description
Since pandas 1.3.0, the above minimal sample code produces the output of a Performance warning.
While i think i understand the warning - i don't understand how to mitigate it (the docs don't contain help i could find for this - and the proposed solution (
copy()
does not seem to work.While this for sure isn't an ideal scenario (assignment of single columns one after the other), i also don't see how this can be changed in our usecase.
The proposed
df.copy()
does not mitigate the warning - and the block count remains the same.Based on my understanding, using
df.loc[:, 'colname'] =
is the recommended way to assign new columns.This does create a new block for every insert - and
df.copy()
(which is proposed in the error) does not consolidate the blocks into 1 block - which means the error can't really be mitigated.Strangely enough - the behaviour of
df['colname] =
anddf.loc[:, 'colname'] =
is not identical - with the first triggering the PerformanceWarning - and the 2nd not triggering the warning (although the problem is still there in the background).So this leaves me with a few questions
frame.copy()
in the error does not do that)Expected Output
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: