Avoiding DataFrame.apply unintended side effect when result_type is not specified. #24614

kefirbandi · 2019-01-04T14:21:29Z

According to the docs (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html)

"In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects..."

Well it definitely is there in the docs, but took me several hours to trace down the bug to this "feature".

So I think it would be cleaner either to fully support side effects in apply (e.g. by calling func on a copy of the first column/row in the testing phase ) or ban it completely if technically possible.

I know there are plans to ban modification when using groupby.apply ( #12653 )
I don't see any issues with mutation inside a (non groupby) apply per se, but I may be wrong.

I also have to note, that the above note from the docs is not entirely correct. If result_type is specified the first row/column is not necessarily processed twice.

The text was updated successfully, but these errors were encountered:

dsaxton · 2019-01-04T19:14:48Z

It seems to me that applying a function with side effects across an entire DataFrame is something you'd almost never want to do, so disallowing it to prevent misuse would make sense. I don't know how you'd determine on-the-fly if an arbitrary function has side effects though.

kefirbandi · 2019-01-05T06:51:08Z

The way I use an apply-function with side effects is modifying a row in-place.
E.g.

def apply_function(row):
    row['A'] *= 2

df.apply(apply_function,axis=1)

In this case I don't even need any return value.

In some other cases my code looks like

def apply_function(row):
    row['A'] *= 2
    return row

df2 = df.apply(apply_function,axis=1)

In the second case I could easily avoid side-effects by copying the row before modifying, but that would lead to a loss of efficiency, and I would no longer be able to use the same function for in-place modification.
So the point I'm trying to make is that I see legitimate use case for a function with side-effect.

But for the other direction about disallowing it completely: It definitely is tricky if not impossible to do. One idea I had (which may or may not be feasible) is to set a "no-modification" flag in the apply method which is checked by each modification method before actually modifying the DataFrame.

fxjung · 2020-04-01T14:01:25Z

In the current version of the docs this is missing. Has it been resolved?

fxjung · 2020-04-01T14:12:54Z

Interestingly:

df = pd.DataFrame({"A": {"x": 1, "y": 1}, "B": {"x": 1, "y": 1}})


def apply_function(x):
    x["A"] *= 2


df.apply(apply_function, axis=1)
print(df)

yields

   A  B
x  1  1
y  1  1

while

df = pd.DataFrame({"A": {"x": 1, "y": 1}, "B": {"x": 1, "y": 1}})


def apply_function(x):
    x["x"] *= 2


df.apply(apply_function, axis=0)
print(df)

yields

   A  B
x  2  2
y  1  1

rhshadrach · 2022-04-16T14:44:51Z

Closed by #39762.

mroeschke added the Apply Apply, Aggregate, Transform, Map label Jan 13, 2019

mroeschke added the Bug label Jun 28, 2020

leohazy mentioned this issue Jul 8, 2020

BUG:sort_values in groupby make some value lost #35137

Closed

1 task

mroeschke added Deprecate Functionality to remove in pandas and removed Bug labels Jun 25, 2021

rhshadrach closed this as completed Apr 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoiding DataFrame.apply unintended side effect when result_type is not specified. #24614

Avoiding DataFrame.apply unintended side effect when result_type is not specified. #24614

kefirbandi commented Jan 4, 2019

dsaxton commented Jan 4, 2019

kefirbandi commented Jan 5, 2019

fxjung commented Apr 1, 2020

fxjung commented Apr 1, 2020

rhshadrach commented Apr 16, 2022

Avoiding DataFrame.apply unintended side effect when result_type is not specified. #24614

Avoiding DataFrame.apply unintended side effect when result_type is not specified. #24614

Comments

kefirbandi commented Jan 4, 2019

dsaxton commented Jan 4, 2019

kefirbandi commented Jan 5, 2019

fxjung commented Apr 1, 2020

fxjung commented Apr 1, 2020

rhshadrach commented Apr 16, 2022