Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series constructor doesn't handle object implementing numpy array interface #8311

Closed
dalejung opened this issue Sep 18, 2014 · 7 comments
Closed
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions

Comments

@dalejung
Copy link
Contributor

In [1]: import pandas as pd
In [2]: import rpy2
In [7]: import numpy as np
In [4]: arr = rpy2.robjects.IntVector(range(10))
In [5]: arr
Out[5]:
<IntVector - Python:0x102ed3908 / R:0x105a7c178>
[       0,        1,        2, ...,        7,        8,        9]

In [6]: pd.Series(arr) # wrong behavior
Out[6]:
0    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
dtype: object

In [9]: np.array(arr)
Out[9]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

In [10]: pd.Series(np.array(arr)) # correct behavior
Out[10]:
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int32

I guess this is a regression due to Series no longer being a subclass of np.ndarray. Not sure if numpy exposes a catchall isarray that does the checks.

@jreback
Copy link
Contributor

jreback commented Sep 18, 2014

IIRC this is an issue with rpy2, you need a fairly recent version of it.

@dalejung
Copy link
Contributor Author

hm. I'm on 2.4.3 which is the latest. I'm not seeing where in the constructor we'd be handling it, unless numpy itself is overloading the isinstance check. I gotta step out for a bit, but I'll dig into it when I get back.

I know this used to work before. I don't use rpy2 regulary...I'm just fixing some tests that have been stuck at red for a long time :)

@jreback
Copy link
Contributor

jreback commented Sep 18, 2014

this doesnt' have anything really to do with numpy. Its the rpy2.robjects.IntVector(range(10))
why does Series not being a sub-class have to do with anythin here?

@dalejung
Copy link
Contributor Author

Ah right. I was meaning that being a ndarray subclass would help the chances of data making it pa.array untouched. Which is what happens on 12.0.

The problem in master is:

# series.py 198
if index is None:
    if not is_list_like(data):
        data = [data]       
    index = _default_index(len(data))

In 12.0 there is no data = [data] and the index check gets called after _sanitize_array anyways.

So I'm thinking that we should have something like:

if data has attr '__array_interface__' or '__array_struct__':
  data = pa.array(data)

I'm not entirely sure what the rpy2 vector could do, it doesn't pull the data from R until you treat it like a numpy array. But it is properly implementing the interface. Something like

arr = rpy2.robjects.IntVector(range(10))
np.sum(arr) # 45

works fine. If we're cool with the array interface check, I'll throw up a PR tonight.

@jreback
Copy link
Contributor

jreback commented Sep 18, 2014

you can check for 'array' is a more compat way

you can try not sure if will work

@dalejung
Copy link
Contributor Author

yar, it was meant to be pseudo-ish, gotta look around to see if there's an established way to check. Will report back.

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

closing, but pls reopen if the issue persists.

@jreback jreback closed this as completed Oct 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

No branches or pull requests

2 participants