REGR: Behavior change with empty apply in pandas 1.3.0rc1 #41997

aberres · 2021-06-14T10:26:05Z

Problem description

The following (toy) snippet worked with 1.2:

df = pd.DataFrame(columns=["a", "b"])
df["a"] = df.apply(lambda x: x["a"], axis=1)

With 1.3 it fails with ValueError: Columns must be same length as key

Technically this is correct - the apply on an empty frame returns the empty frame so things do not really match.

Expected Output

It still works? Just reporting it here if this is an unintended change. Maybe I missed it, but I did not see this mentioned in the changelog.

The fix is to only call apply when the frame is not empty I guess? I stumbled upon this one when running our test suite.

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2021-06-14T11:56:16Z

Thanks @aberres for the report

Just reporting it here if this is an unintended change. Maybe I missed it, but I did not see this mentioned in the changelog.

first bad commit: [caf81fa] BUG: DataFrame.setitem not raising ValueError when rhs is df and has wrong number of columns (#39341)

cc @phofl

phofl · 2021-06-14T12:03:17Z

@simonjayhawkins I've relabeled, since this is an indexing issue, not related to apply.

The RHS is

Empty DataFrame
Columns: [a, b]
Index: []

This is basically the case we have changed the behavior for.

df = pd.DataFrame(columns=["a", "b"])
rhs = pd.DataFrame(columns=["a", "b"])
df["a"] = rhs

I think this behaves now as expected.

@aberres The note you are looking for is the one for #38604 in the whatsnew

Edit: The idea behind this is, that the number of columns has to match. Previously we assigned always the first column and silently dropped the rest.

aberres · 2021-06-14T12:10:13Z

@aberres The note you are looking for is the one for #38604 in the whatsnew

Thanks, makes sense.

Not sure if this happens a lot out in the wild - but maybe this case should be allowed for empty columns? I addd if not df.empty in my code which worked fine - as expected.

phofl · 2021-06-14T13:01:52Z

Don't think behavior based on data is desirable. I am not sure what you want to achieve here if this works for empty dfs. When you know, that your DataFrame is empty why bothering with the setitem call? If it is not empty, this will raise even if we would allow empty frames

@jbrockmendel thoughts?

phofl · 2021-06-14T13:08:49Z

Hm maybe I was wrong above and this is an apply issue.

df = pd.DataFrame([[1, 2]], columns=["a", "b"])
df.apply(lambda x: x["a"], axis=1)
0    1
dtype: int64


df = pd.DataFrame(columns=["a", "b"])
df.apply(lambda x: x["a"], axis=1)

Empty DataFrame
Columns: [a, b]
Index: []

Looks inconsistent, but I am not familiar enough with apply to asses this.

Nevertheless I am against catching the incosistency here in setitem. You can get into all sorts of trouble, if you expect a Series but receive a DataFrame.

aberres · 2021-06-14T13:20:30Z

If it is not empty, this will raise even if we would allow empty frames

Yeah, as you noticed in the "not empty case" only a single column is returned and the assignment works fine.

Technically it is not wrong to raise in the empty case. It is just a change in behavior which might or might not cause problems.

jbrockmendel · 2021-06-14T16:48:30Z

Nevertheless I am against catching the incosistency here in setitem. You can get into all sorts of trouble

Agreed

df = pd.DataFrame(columns=["a", "b"])
df.apply(lambda x: x["a"], axis=1)

Empty DataFrame
Columns: [a, b]
Index: []
Looks inconsistent, but I am not familiar enough with apply to asses this.

Yah seems like we should get back and empty Series right?

phofl · 2021-06-14T16:49:59Z

Yah seems like we should get back and empty Series right?

yes would have expected that as well

rhshadrach · 2021-06-25T20:59:30Z

I don't know how we can determine what we should get back when we have nothing to go on. Consider:

df = pd.DataFrame(columns=["a", "b"])
df.apply(lambda x: x["a"], axis=1)
df.apply(lambda x: x[["a", "b"]], axis=1)

How can we tell the difference between these internally? It seems to me the only options are to either raise directly or return the df as-is.

Edit: But agreed this is not an issue with setitem. While I see this behavior as undesired, I don't see a way to avoid it.

rhshadrach · 2021-07-18T16:15:03Z

Looking at this again, pandas.core.apply.FrameApply.apply_empty_result is attempting to pass an empty Series to discern the shape of the output. However because that empty Series is not able to have index values, the lambda in this issue fail and this method fallsback to just returning the entire object itself.

If we were able to pass in an empty Series with the correct index values ("a" and "b"), we could differentiate between these two. However as far as I know, that currently is not possible.

jordantshaw · 2021-07-27T17:05:35Z

I am also experiencing the same issue. Example below.

df = pd.DataFrame(columns=["a", "b"])
df['a'] = df.apply(lambda x: x["a"], axis=1)

I would expect 'x' in this case to be an empty Series, but instead it is returning an empty DataFrame. When as Any ideas on expected resolution?

simonjayhawkins · 2021-10-16T19:08:22Z

changing milestone to 1.3.5

gshaikov · 2021-10-18T12:09:08Z

Just reported a related issue with apply swallowing exceptions: #44031

apply explicitly ignores all exceptions when processing an empty DataFrame. This leads to apply returning an empty DataFrame, which is an inconsistency, as it was already pointed out.

pandas/pandas/core/apply.py

Lines 779 to 782 in 22de58e

    
           try: 
        
               r = self.f(Series([], dtype=np.float64)) 
        
           except Exception: 
        
               pass

In my opinion, this inconsistency is actually caused by another inconsistency:

When DataFrame has rows, apply(..., axis=1) takes each row as Series.
However, when DataFrame is empty, we pass empty series.

The issue is that semantics are different between these two cases. In the former, keys exist and are the same as the DataFrame columns. In the latter, keys don't exist - we get a different data structure.

Option 1
We don't swallow exceptions in apply on empty DataFrame. This would be equivalent to the behaviour on non-empty DataFrame. We let the user to handle empty Series vs non-empty Series in their apply function.

Option 2
In pandas, a Series with keys and no data is not possible. To avoid this inconsistency then we could pass a series with keys and None as values. Semantically this would be closer to passing a Series with no data than the current solution with empty series:

keys a present, no exception on indexing raised
indexing returns None, should be handled by the user.

Thoughts?

gshaikov · 2021-11-07T15:43:53Z

Hi @rhshadrach any new comments on the above? Happy to make an PR is this sounds reasonable.

rhshadrach · 2021-11-19T22:42:12Z

@gshaikov - Option 2 doesn't seem viable to me. There is no way for a user to differentiate between a Series of all None values and an empty Series. I would also describe that user experience as being "quirky".

Option 1 does seem better than the current behavior - it would allow for the user to handle an empty DataFrame as they see fit. However, it's tempting to think of apply as "for loop over the rows/columns", and it violates this viewpoint as an empty frame should result in calling the UDF 0 times (which is currently violated in today's implementation too!). With this, it doesn't seem like a clear win to me and I'd like to hear what others might think.

simonjayhawkins · 2021-11-27T12:52:55Z

Looking at this issue from purely a backport fix perspective, I doesn't appear that we have any solutions here (for 1.3.x).

Changing the behavior of apply would not be suitable for a backport.

For backport, we would either need to:

revert the change to setitem that caused the regression.
catch the apply inconsistency in setitem

I think option 2 has been ruled out #41997 (comment) and #41997 (comment)

I think option 1 is undesirable (especially late in the 1.3.x branch) since the change was a bugfix and is now correct behavior. #41997 (comment)

I propose we remove this from the 1.3.5 milestone.

simonjayhawkins · 2021-11-28T10:18:14Z

removing issue from 1.3.x milestone. Any changes to address the apply issue would not be backported.

emsi · 2022-08-05T07:48:30Z

As I mentioned in #47966 this behavior was previously documented and at least should be documented again.

aberres added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 14, 2021

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Jun 14, 2021

code sample for pandas-dev#41997

e14556a

simonjayhawkins added Regression Functionality that used to work in a prior pandas version Apply Apply, Aggregate, Transform, Map and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 14, 2021

simonjayhawkins added this to the 1.3 milestone Jun 14, 2021

phofl added Indexing Related to indexing on series/frames, not to indexes themselves and removed Apply Apply, Aggregate, Transform, Map labels Jun 14, 2021

phofl added Apply Apply, Aggregate, Transform, Map and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 14, 2021

simonjayhawkins mentioned this issue Jun 25, 2021

RLS: 1.3 #40169

Closed

simonjayhawkins changed the title ~~1.3: (intended?) Behavior change with empty apply~~ REGR: Behavior change with empty apply in pandas 1.3.0rc1 Jun 25, 2021

simonjayhawkins modified the milestones: 1.3, 1.3.1 Jun 30, 2021

simonjayhawkins modified the milestones: 1.3.1, 1.3.2 Jul 24, 2021

simonjayhawkins modified the milestones: 1.3.2, 1.3.3 Aug 15, 2021

simonjayhawkins modified the milestones: 1.3.3, 1.3.4 Sep 11, 2021

rhshadrach mentioned this issue Oct 16, 2021

BUG: apply swallows exceptions, shows inconsistent behaviour #44031

Closed

3 tasks

simonjayhawkins modified the milestones: 1.3.4, 1.3.5 Oct 16, 2021

simonjayhawkins modified the milestones: 1.3.5, Contributions Welcome Nov 28, 2021

rhshadrach mentioned this issue Jan 29, 2022

BUG: GROUPING UNPOPULATED DATAFRAME raises exception - index name clashes with duplicate column name #44350

Closed

3 tasks

phofl mentioned this issue Aug 4, 2022

BUG: DataFrame apply is swallowing exceptions #47966

Closed

3 tasks

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: Behavior change with empty apply in pandas 1.3.0rc1 #41997

REGR: Behavior change with empty apply in pandas 1.3.0rc1 #41997

aberres commented Jun 14, 2021 •

edited

Loading

simonjayhawkins commented Jun 14, 2021

phofl commented Jun 14, 2021 •

edited

Loading

aberres commented Jun 14, 2021

phofl commented Jun 14, 2021

phofl commented Jun 14, 2021

aberres commented Jun 14, 2021

jbrockmendel commented Jun 14, 2021

phofl commented Jun 14, 2021

rhshadrach commented Jun 25, 2021 •

edited

Loading

rhshadrach commented Jul 18, 2021 •

edited

Loading

jordantshaw commented Jul 27, 2021 •

edited

Loading

simonjayhawkins commented Oct 16, 2021

gshaikov commented Oct 18, 2021 •

edited

Loading

gshaikov commented Nov 7, 2021

rhshadrach commented Nov 19, 2021

simonjayhawkins commented Nov 27, 2021

simonjayhawkins commented Nov 28, 2021

emsi commented Aug 5, 2022

REGR: Behavior change with empty apply in pandas 1.3.0rc1 #41997

REGR: Behavior change with empty apply in pandas 1.3.0rc1 #41997

Comments

aberres commented Jun 14, 2021 • edited Loading

Problem description

Expected Output

simonjayhawkins commented Jun 14, 2021

phofl commented Jun 14, 2021 • edited Loading

aberres commented Jun 14, 2021

phofl commented Jun 14, 2021

phofl commented Jun 14, 2021

aberres commented Jun 14, 2021

jbrockmendel commented Jun 14, 2021

phofl commented Jun 14, 2021

rhshadrach commented Jun 25, 2021 • edited Loading

rhshadrach commented Jul 18, 2021 • edited Loading

jordantshaw commented Jul 27, 2021 • edited Loading

simonjayhawkins commented Oct 16, 2021

gshaikov commented Oct 18, 2021 • edited Loading

gshaikov commented Nov 7, 2021

rhshadrach commented Nov 19, 2021

simonjayhawkins commented Nov 27, 2021

simonjayhawkins commented Nov 28, 2021

emsi commented Aug 5, 2022

aberres commented Jun 14, 2021 •

edited

Loading

phofl commented Jun 14, 2021 •

edited

Loading

rhshadrach commented Jun 25, 2021 •

edited

Loading

rhshadrach commented Jul 18, 2021 •

edited

Loading

jordantshaw commented Jul 27, 2021 •

edited

Loading

gshaikov commented Oct 18, 2021 •

edited

Loading