Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove failing tests #469

Merged
merged 1 commit into from
Feb 21, 2019
Merged

remove failing tests #469

merged 1 commit into from
Feb 21, 2019

Conversation

TomAugspurger
Copy link
Member

Theses tests were asserting that you could train on a dataframe and transform an array.

I don't really know how these worked in the past. Pandas and numpy unfortunately have different broadcasting behavior (which pandas is working to fix).

In [16]: x
Out[16]:
array([[ 1.2061934 ,  0.78132208],
       [ 0.14763367, -0.47647038],
       [ 0.37784154,  1.13200053],
       [ 1.58888898, -1.84890062]])

In [17]: s
Out[17]:
0    1.206193
1    0.781322
dtype: float64

In [18]: x * s.values
Out[18]:
array([[ 1.45490252,  0.61046419],
       [ 0.17807476, -0.37227683],
       [ 0.45574998,  0.88445701],
       [ 1.9165074 , -1.44458687]])

In [19]: x * s
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-b699db99ec8b> in <module>
----> 1 x * s

~/miniconda3/envs/pandas-0.23.4/lib/python3.6/site-packages/pandas/core/ops.py in wrapper(left, right)
   1069         result = safe_na_op(lvalues, rvalues)
   1070         return construct_result(left, result,
-> 1071                                 index=left.index, name=res_name, dtype=None)
   1072
   1073     return wrapper

~/miniconda3/envs/pandas-0.23.4/lib/python3.6/site-packages/pandas/core/ops.py in _construct_result(left, result, index, name, dtype)
    978     not be enough; we still need to override the name attribute.
    979     """
--> 980     out = left._constructor(result, index=index, dtype=dtype)
    981
    982     out.name = name

~/miniconda3/envs/pandas-0.23.4/lib/python3.6/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    260                             'Length of passed values is {val}, '
    261                             'index implies {ind}'
--> 262                             .format(val=len(data), ind=len(index)))
    263                 except TypeError:
    264                     pass

ValueError: Length of passed values is 4, index implies 2

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After running this test with a few recent commits in the Dask git history, this test starts breaking with commit dask/dask@79cb051. That commit fixed an issue where we were calling da.asarray internally in Dask which converted arrays of other types to NumPy arrays when we should have been calling da.asanyarray which preserves other array subclasses.

I suspect that's why this test was passing originally. Pandas Series were being converted to NumPy arrays and the broadcasting behavior worked.

I'm +1 for the changes in this PR

@TomAugspurger
Copy link
Member Author

Thanks for digging to find the actual change @jrbourbeau :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants