-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoiding DataFrame.apply unintended side effect when result_type is not specified. #24614
Comments
It seems to me that applying a function with side effects across an entire |
The way I use an apply-function with side effects is modifying a row in-place. def apply_function(row):
row['A'] *= 2
df.apply(apply_function,axis=1) In this case I don't even need any return value. In some other cases my code looks like def apply_function(row):
row['A'] *= 2
return row
df2 = df.apply(apply_function,axis=1) In the second case I could easily avoid side-effects by copying the row before modifying, but that would lead to a loss of efficiency, and I would no longer be able to use the same function for in-place modification. But for the other direction about disallowing it completely: It definitely is tricky if not impossible to do. One idea I had (which may or may not be feasible) is to set a "no-modification" flag in the |
In the current version of the docs this is missing. Has it been resolved? |
Interestingly: df = pd.DataFrame({"A": {"x": 1, "y": 1}, "B": {"x": 1, "y": 1}})
def apply_function(x):
x["A"] *= 2
df.apply(apply_function, axis=1)
print(df) yields
while df = pd.DataFrame({"A": {"x": 1, "y": 1}, "B": {"x": 1, "y": 1}})
def apply_function(x):
x["x"] *= 2
df.apply(apply_function, axis=0)
print(df) yields
|
Closed by #39762. |
According to the docs (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html)
"In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects..."
Well it definitely is there in the docs, but took me several hours to trace down the bug to this "feature".
So I think it would be cleaner either to fully support side effects in apply (e.g. by calling func on a copy of the first column/row in the testing phase ) or ban it completely if technically possible.
I know there are plans to ban modification when using
groupby.apply
( #12653 )I don't see any issues with mutation inside a (non groupby) apply per se, but I may be wrong.
I also have to note, that the above note from the docs is not entirely correct. If
result_type
is specified the first row/column is not necessarily processed twice.The text was updated successfully, but these errors were encountered: