-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Series Changes breaks rpy2? #5698
Comments
I think they might need to have a slightly different conversion function, can you post to their dev site? |
ok...well make this an API issue here to 'track' it. |
@jreback, this could be a big deal once 0.13.0 final is released. No word from rpy2? Moved to 0.13 to make final decision right before release is due. Would be good to preempt This also breaks the docs. |
@JanSchulz can u ping rpy2? |
cc @lgautier, we're anxious to avoid releasing a point release with no mitigation for this. |
could be as simple as checking for |
Hope it is, but it needs to happen on rpy2's side, no? |
@jreback why doesn't I think we might need to do that to be able to actually work with rpy2. |
not necessary as array is called but if u need it explicitly then go for it I don't have rpy so really can't even try it |
here is an example |
@JanSchulz could you post a really simple example? I can't seem to get your notebook to work. I think the fix for rpy2 is actually really simple. |
@jreback could we just do: def __array_interface__(self):
return self.__array__().__array_interface__() ? |
it's a property (don't need function call) |
@jtratner I also only worked with rpy2 the first time with that notebook. :-) The notebook is from @kevindavenport Here is a small notebook with three cells: --- setup ---
%load_ext rmagic
import pandas as pd
df = pd.DataFrame({"x":[1,2,3,4,5], "y":[1,2,3,2,1]})
vals = df.values
cols = df.columns
---
--- using df.values: works ---
%%R -i vals,cols # list object to be transferred to python here
install.packages("ggplot2") # Had to add this for some reason, shouldn't be necessary
library(ggplot2)
df = data.frame(vals)
names(df) <- c(cols)
plot = ggplot(df, aes(x = x, y = y)) +
geom_point(alpha = .8, color = 'dodgerblue',size = 5)
print(plot)
---
--- df directly: does not work ---
%%R -i df # list object to be transferred to python here
install.packages("ggplot2") # Had to add this for some reason, shouldn't be necessary
library(ggplot2)
df = data.frame(df)
plot = ggplot(df, aes(x = x, y = y)) +
geom_point(alpha = .8, color = 'dodgerblue',size = 5)
print(plot)
--- |
BTW: I don't think that adding a property is enough: currently the error happens because def numpy2ri(o):
""" Augmented conversion function, converting numpy arrays into
rpy2.rinterface-level R structures. """
if isinstance(o, numpy.ndarray):
# numpy handling
else:
res = ro.default_py2ri(o) As the pandas part ( I did a small change: def numpy2ri(o):
""" Augmented conversion function, converting numpy arrays into
rpy2.rinterface-level R structures. """
if isinstance(o, (numpy.ndarray, pd.Series)):
# numpy handling
else:
res = ro.default_py2ri(o) and I got the following error: ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-3c6069b19c27> in <module>()
----> 1 get_ipython().run_cell_magic(u'R', u'-i df # list object to be transferred to python here', u'install.packages("ggplot2") # Had to add this for some reason, shouldn\'t be necessary\nlibrary(ggplot2)\ndf = data.frame(df)\nplot = ggplot(df, aes(x = x, y = y)) + \ngeom_point(alpha = .8, color = \'dodgerblue\',size = 5)\nprint(plot)')
C:\portabel\Python27\lib\site-packages\IPython\core\interactiveshell.pyc in run_cell_magic(self, magic_name, line, cell)
2141 magic_arg_s = self.var_expand(line, stack_depth)
2142 with self.builtin_trap:
-> 2143 result = fn(magic_arg_s, cell)
2144 return result
2145
C:\portabel\Python27\lib\site-packages\IPython\extensions\rmagic.pyc in R(self, line, cell, local_ns)
C:\portabel\Python27\lib\site-packages\IPython\core\magic.pyc in <lambda>(f, *a, **k)
191 # but it's overkill for just that one bit of state.
192 def magic_deco(arg):
--> 193 call = lambda f, *a, **k: f(*a, **k)
194
195 if callable(arg):
C:\portabel\Python27\lib\site-packages\IPython\extensions\rmagic.pyc in R(self, line, cell, local_ns)
585 except KeyError:
586 raise NameError("name '%s' is not defined" % input)
--> 587 self.r.assign(input, self.pyconverter(val))
588
589 if getattr(args, 'units') is not None:
C:\portabel\Python27\lib\site-packages\rpy2\robjects\functions.pyc in __call__(self, *args, **kwargs)
84 v = kwargs.pop(k)
85 kwargs[r_k] = v
---> 86 return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
C:\portabel\Python27\lib\site-packages\rpy2\robjects\functions.pyc in __call__(self, *args, **kwargs)
29
30 def __call__(self, *args, **kwargs):
---> 31 new_args = [conversion.py2ri(a) for a in args]
32 new_kwargs = {}
33 for k, v in kwargs.iteritems():
C:\portabel\Python27\lib\site-packages\rpy2\robjects\pandas2ri.pyc in pandas2ri(obj)
26 od[name] = StrVector(values)
27 else:
---> 28 od[name] = ro.conversion.py2ri(values)
29 return DataFrame(od)
30 elif isinstance(obj, PandasIndex):
C:\portabel\Python27\lib\site-packages\rpy2\robjects\pandas2ri.pyc in pandas2ri(obj)
49 else:
50 # converted as a numpy array
---> 51 res = original_conversion(obj)
52 # "index" is equivalent to "names" in R
53 if obj.ndim == 1:
C:\portabel\Python27\lib\site-packages\rpy2\robjects\numpy2ri.py in numpy2ri(o)
35 if o.dtype.kind in _kinds:
36 # "F" means "use column-major order"
---> 37 vec = SexpVector(o.ravel("F"), _kinds[o.dtype.kind])
38 dim = SexpVector(o.shape, INTSXP)
39 res = ro.r.array(vec, dim=dim)
TypeError: ravel() takes exactly 1 argument (2 given) as o is in this case a |
Changing def ravel(self, order=None):
return self.values.ravel(order) will, together with the fix for isinstance above, produce a plot. So if pandas would add the order to |
@JanSchulz easy enough....on ravel; you can do a run-time path for |
@jtratner What is a run-time path? |
run-time patch.....e.g. I presume this is defined somewhere in |
@JanSchulz ok...ravel fix is in....give another try |
I have experience with rpy, how can I be of service? |
@jreback , you're not suggesting fixing by monkey patching rpy2 during numpy import are you? |
@y-p where exactly is |
in 0.13 (releasing imminently), Series is no longer a sub-class of so apparently their is some isinstance detection going on with |
We need either a way to modify pandas to keep it compatible or a reasonable PR submitted As much as I'm in a position to urge against it: monkey-patching another library is not an |
I'm pretty sure I have a fix, I just didn't have an example to work with. Should be a pretty trivial fix too. |
The patch is easy: --- rpy2\robjects\numpy2ri.py Fri Dec 20 17:35:43 2013
+++ rpy2\robjects\numpy2ri.py.new Fri Dec 20 17:42:42 2013
@@ -3,6 +3,7 @@
import rpy2.rinterface as rinterface
from rpy2.rinterface import SexpVector, INTSXP
import numpy
+import pandas
from rpy2.robjects.vectors import DataFrame, Vector, ListVector
@@ -26,7 +27,7 @@
def numpy2ri(o):
""" Augmented conversion function, converting numpy arrays into
rpy2.rinterface-level R structures. """
- if isinstance(o, numpy.ndarray):
+ if isinstance(o, (numpy.ndarray, pandas.Series)):
if not o.dtype.isnative:
raise(ValueError("Cannot pass numpy arrays with non-native byte orders at the moment.")) Source is here: https://bitbucket.org/lgautier/rpy2/src/e2d25d5bd6254c5e381d87c46c90cac30f18b5b2/rpy/robjects/numpy2ri.py?at=version_2.4.x I don't have hg, so I', not currently able to do a PR. |
I'd change it to: obj = getattr(obj, 'array', obj) That way there's no pandas dep and works with anything that supports the |
@lgautier if @JanSchulz keeps having trouble I can submit a pull request on bitbucket, because I have everything installed (just moving right now so I don't have a lot of free time). @y-p @jreback and other pandas peeps - Maybe we should consider making up a script that tests packages that we know depend on pandas so we can either make changes to pandas or notify those devs so they can work on a solution pre-release. |
@lgautier, I agree on most points (especially your time being your own) but we would like @jtratner, would you mind handling a backport of the patch + PR for the 2.3.x branch of rpy2 + testing |
@y-p okay, I'll try to get that done tonight. |
Thanks! let me know if you need to hand it off after all. |
Update: fix is relatively simple now that we pushed the ravel PR. I'm just checking to make sure that it ends up with the correct results and writing up some additional test cases. I'm not totally clear on the correct behavior, so I'm switching back and forth to make sure nothing broke. |
(and I'm on the same page with @JanSchulz 's one liner fix, with a small modification to make it work without pandas) - so is the goal to have a monkey patch in a gist that we can point to for 0.13? An actual function within pandas? Just not clear how we handle incompatibilities like this and making it less painful for people with legacy setups. |
2.4.x is already patched. @lgautier asked for a PR to release a new minor release of 2.3.x. so users The release notes already point to this issue and can and users can find the fix and discussion here |
Great |
... I'm assuming that PR will come from you? :) |
Yes |
0.13.0 is out, the dev version of rpy2 has been fixed. I hope @jtratner submitted that PR. closing. |
He did. I just merged it (rpy2 branch version_2.3.x, and grafted onto version_2.4.x). |
excellent. Thank you both. |
Is it on pypi yet? |
Not yet. Probably some time over the week-end. |
Looking fine with Python 2.7, but causing segfault with Python 3.3 and numpy 1.7.1 (https://drone.io/bitbucket.org/lgautier/rpy2/20). I cannot reproduce the segfault locally though. Any chance someone else could try out ? |
Rerun the build, if it consistently segfaults, I'll take a look. |
Thanks for putting that fix in @lgautier! |
@y-p The build on drone.io is consistently ending with a segfault on Python 3.3, while the same code does not locally. I'll hold the release of rpy2-2.3.9 until at least one of the 2 things happens: others report to have it working fine, or the problem on drone.io is identified (and fixed). |
That makes good sense. I'll try to repro on my box. |
I can reproduce the segfault, and I can reproduce it prior to @jtratner's commit,
You should make sure the CI output gives your more information. |
It might well be the case: there was no build on drone for the branch version_2.3.x prior to the merge of the pull request. Now this is quite odd:
or
|
I probably used the wrong version of nose, I compiled for 3.3 but invoked the py2 nose. Trying again:
... but no segfault. This is with db6c132, the current tip of the That's all I have. |
For what it's worth, I get a bunch of segfaults on the released version |
This must be version-specific somewhere, making it OK on some machine (my computer, @y-p 's box) but not others (drone's VM, your machine). C compiler ? R version ? something else ? (The issue tracker for rpy2 on bitbucket might be a better place to follow up on this) |
I've seen issues raised by differences between debian and ubuntu libc. I run fedora, what do you and |
Ubuntu (me 13.10, not sure about the version used by drone) |
It seems that the changes to to Series break the data conversion to R: running this Notebook doesn't work anymore with some dev version from last week:
https://gist.github.com/kevindavenport/7771325/raw/87ab5603f406729c6a3866f95af9a1ebfedcf619/Mahalanobis_Outliers.ipynb
The resulting error is this:
I'm not sure if this is something pandas cares, but even if not it would be nice to mention it in the release notes.
The text was updated successfully, but these errors were encountered: