-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: Behavior change with empty apply in pandas 1.3.0rc1 #41997
Comments
@simonjayhawkins I've relabeled, since this is an indexing issue, not related to apply. The RHS is
This is basically the case we have changed the behavior for. df = pd.DataFrame(columns=["a", "b"])
rhs = pd.DataFrame(columns=["a", "b"])
df["a"] = rhs I think this behaves now as expected. @aberres The note you are looking for is the one for #38604 in the whatsnew Edit: The idea behind this is, that the number of columns has to match. Previously we assigned always the first column and silently dropped the rest. |
Don't think behavior based on data is desirable. I am not sure what you want to achieve here if this works for empty dfs. When you know, that your DataFrame is empty why bothering with the setitem call? If it is not empty, this will raise even if we would allow empty frames @jbrockmendel thoughts? |
Hm maybe I was wrong above and this is an apply issue. df = pd.DataFrame([[1, 2]], columns=["a", "b"])
df.apply(lambda x: x["a"], axis=1)
0 1
dtype: int64
df = pd.DataFrame(columns=["a", "b"])
df.apply(lambda x: x["a"], axis=1)
Empty DataFrame
Columns: [a, b]
Index: [] Looks inconsistent, but I am not familiar enough with apply to asses this. Nevertheless I am against catching the incosistency here in setitem. You can get into all sorts of trouble, if you expect a Series but receive a DataFrame. |
Yeah, as you noticed in the "not empty case" only a single column is returned and the assignment works fine. Technically it is not wrong to raise in the empty case. It is just a change in behavior which might or might not cause problems. |
Agreed
Yah seems like we should get back and empty Series right? |
yes would have expected that as well |
I don't know how we can determine what we should get back when we have nothing to go on. Consider:
How can we tell the difference between these internally? It seems to me the only options are to either raise directly or return the df as-is. Edit: But agreed this is not an issue with setitem. While I see this behavior as undesired, I don't see a way to avoid it. |
Looking at this again, If we were able to pass in an empty Series with the correct index values ("a" and "b"), we could differentiate between these two. However as far as I know, that currently is not possible. |
I am also experiencing the same issue. Example below.
I would expect 'x' in this case to be an empty Series, but instead it is returning an empty DataFrame. When as Any ideas on expected resolution? |
changing milestone to 1.3.5 |
Just reported a related issue with
Lines 779 to 782 in 22de58e
In my opinion, this inconsistency is actually caused by another inconsistency:
The issue is that semantics are different between these two cases. In the former, keys exist and are the same as the DataFrame columns. In the latter, keys don't exist - we get a different data structure. Option 1 Option 2
Thoughts? |
Hi @rhshadrach any new comments on the above? Happy to make an PR is this sounds reasonable. |
@gshaikov - Option 2 doesn't seem viable to me. There is no way for a user to differentiate between a Series of all None values and an empty Series. I would also describe that user experience as being "quirky". Option 1 does seem better than the current behavior - it would allow for the user to handle an empty DataFrame as they see fit. However, it's tempting to think of apply as "for loop over the rows/columns", and it violates this viewpoint as an empty frame should result in calling the UDF 0 times (which is currently violated in today's implementation too!). With this, it doesn't seem like a clear win to me and I'd like to hear what others might think. |
Looking at this issue from purely a backport fix perspective, I doesn't appear that we have any solutions here (for 1.3.x). Changing the behavior of apply would not be suitable for a backport. For backport, we would either need to:
I think option 2 has been ruled out #41997 (comment) and #41997 (comment) I think option 1 is undesirable (especially late in the 1.3.x branch) since the change was a bugfix and is now correct behavior. #41997 (comment) I propose we remove this from the 1.3.5 milestone. |
removing issue from 1.3.x milestone. Any changes to address the apply issue would not be backported. |
As I mentioned in #47966 this behavior was previously documented and at least should be documented again. |
Problem description
The following (toy) snippet worked with 1.2:
With 1.3 it fails with
ValueError: Columns must be same length as key
Technically this is correct - the apply on an empty frame returns the empty frame so things do not really match.
Expected Output
It still works? Just reporting it here if this is an unintended change. Maybe I missed it, but I did not see this mentioned in the changelog.
The fix is to only call apply when the frame is not empty I guess? I stumbled upon this one when running our test suite.
The text was updated successfully, but these errors were encountered: