Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC clarify inplace operation section in 1.5 whats_new #47433

Merged
merged 5 commits into from
Jun 30, 2022

Conversation

lesteve
Copy link
Contributor

@lesteve lesteve commented Jun 20, 2022

xref #47381

Coming from scikit-learn tests failing on scikit-learn with the pandas development version, I found the whats_new entry not very helpful at all. This does a few things:

  • use an example with different dtypes so that the behaviour is as advertised for old behaviour (i.e. ser is not updated in place). This was not the case before, since when dtype matches, ser is updated in place so the code snippet was not showing the right behaviour.
  • make it clearer that there is no behaviour change in pandas 1.5

Some of the wording may be not completely accurate, as I don't have a very good grasp of the pandas internals, feel free to suggest improvements!

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs some editing

i

@jreback jreback added the Docs label Jun 21, 2022
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know if this is an improvement. can you try to make a minimal change instead.


.. code-block:: ipython

In [3]: df.iloc[:, 0] = np.array([10, 11])
In [3]: df.iloc[:, 0] = np.array([10, 11], dtype=np.int32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change this?

Copy link
Contributor Author

@lesteve lesteve Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the dtypes match the old (pandas < 1.5) behaviour is already to update the underlying array in place.

import numpy as np
import pandas as pd

print(f"{pd.__version__=}")
values = np.arange(4).reshape(2, 2)
df = pd.DataFrame(values)
ser = df[0]
print(f"before setting column\n{ser}")
df.iloc[:, 0] = np.array([10, 11])
print(f"after setting column\n{ser}")

Output:

pd.__version__='1.4.2'
before setting column
0    0
1    2
Name: 0, dtype: int64
after setting column
0    10
1    11
Name: 0, dtype: int64

This snippet was incorrect in pretending that ser was not updated in place.

I used a different dtype for the assignment rhs term to be in the case, where:

  • the old (pandas < 1.5) behaviour is not to update in place
  • the new (pandas 1.5) behaviour is the same with an additional warning
  • the future behaviour will be to update in place

In [4]: ser
Out[4]:
0 0
1 2
Name: 0, dtype: int64

This behavior is deprecated. In a future version, setting an entire column with
iloc will attempt to operate inplace.
*Behavior with pandas 1.5* is the same but you get a ``FutureWarning``:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave the original note, this is not inline with our styling.


*Future behavior*:

In a future version, setting an entire column with ``iloc`` will attempt to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is redundant

@@ -562,14 +562,15 @@ As ``group_keys=True`` is the default value of :meth:`DataFrame.groupby` and
raise a ``FutureWarning``. This can be silenced and the previous behavior
retained by specifying ``group_keys=False``.

.. _whatsnew_150.notable_bug_fixes.setitem_column_try_inplace:
.. _whatsnew_150.deprecations.setitem_column_try_inplace:
_ see also _whatsnew_130.notable_bug_fixes.setitem_column_try_inplace
Copy link
Contributor Author

@lesteve lesteve Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side-comment: the _ see also ... line is not rendered in the HTML, not sure whether it was supposed to be a link to https://pandas.pydata.org/docs/dev/whatsnew/v1.3.0.html#try-operating-inplace-when-setting-values-with-loc-and-iloc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it looks like it's supposed to reference that section. Mind correcting it? Can be in the body of the paragraph too like Please reference `this section <whatsnew_130.notable_bug_fixes.setitem_column_try_inplace>` of the 1.3 whatsnew file.

@lesteve
Copy link
Contributor Author

lesteve commented Jun 21, 2022

I have tried to make the diff smaller as advised

_ see also _whatsnew_130.notable_bug_fixes.setitem_column_try_inplace

Try operating inplace when setting values with ``loc`` and ``iloc``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Inplace operation when setting values with ``iloc``
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure but it seems the changes are only about .iloc so I remove the .loc mention

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this happens with .loc too when setting the entire column so reverting this change. In this case the warning is a bit misleading as it mentions .iloc

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, added couple of small things, but this makes sense.

If you want to spend the time, I think it'd help read this section is the example had some more "interesting" data. I think it's more difficult to understand the code with the original arbitrary code, and even more now, that one needs to understand what's going on with the types.

Maybe something like:

pandas.DataFrame({'price': [10, 15]}, index=['book1', 'boo2'])
new_prices = [11.50, 18.20]

Does this make sense?

@@ -595,7 +596,7 @@ iloc will attempt to operate inplace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth adding ...an entire column with different type with iloc... to the comment above?


*Future behavior*:
*New behavior*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what we're saying, this is not new behavior, but future after the deprecation, right? Or I'm missing something? Also, there is another Future behavior block above, that needs to be updated if I'm wrong. And the colon is there in the other behavior titles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that DataFrame.isetitem has been added in https://github.com/pandas-dev/pandas/pull/45333/files#diff-421998af5fe0fbc54b35773ce9b6289cae3e8ae607f81783af04ebf1fbcaf077R3690 so it only exists in pandas 1.5

@lesteve
Copy link
Contributor Author

lesteve commented Jun 22, 2022

If you want to spend the time, I think it'd help read this section is the example had some more "interesting" data.

I did this in my last commit, I also showed df.iloc[:, 0] which I think helps in understanding what is happening (whether the column keeps its dtype or not).

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems super clear to me now, thanks a lot @lesteve

@simonjayhawkins simonjayhawkins added this to the 1.5 milestone Jun 25, 2022
@mroeschke mroeschke merged commit 9a3231f into pandas-dev:main Jun 30, 2022
@mroeschke
Copy link
Member

Thanks @lesteve

@lesteve lesteve deleted the enh-inplace-iloc-whats-new branch July 1, 2022 06:31
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
)

* DOC make inplace operation section in whats_new clearer

* tweak

* Can happen with both loc and iloc when setting entire column

* Use more meaningful data

Co-authored-by: Matthew Roeschke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants