-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Handle categorical dtype to/from R #9187
Conversation
Failure is unrelated and why didn't skip-ci work? |
Is using the pandas interface the preferred way to do pandas/R dataframe conversion, or should we leave this to rpy2 now that it has built in support for pandas? I've had pretty good luck with using @lgautier any input? For reference, here is where you'll find the pandas conversion code in rpy2: https://bitbucket.org/rpy2/rpy2/src/f1f15fe7eb6d1d65e8b3de4ee7fd697b800f001e/rpy/robjects/pandas2ri.py?at=default BTW, I restarted the failed travis job -- we'll see if it work this time. |
I am adding the following comment so you (
While the conversion in |
I don't have a strong opinion on this, though I suspect any "fixes" should definitely make it in to rpy2. As a prior though, I assume pandas development moves a bit faster, but that's mostly a guess. My mercurial/bitbucket-fu is also super weak (so far), so it's easier for me to make a pandas PR :) |
Ya ya pandas is more and better in any possible and conceivable way [yawn]
1- Go to https://bitbucket.org/rpy2/rpy2 |
levels = np.asarray(obj.levels) | ||
values = np.asarray(obj) | ||
if com.is_float_dtype(values): | ||
mask = np.isnan(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jseabold: I don't see a test exercising this code path (when is_float_dtype(values)
is True
). Can you explain its purpose and/or perhaps add a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was copy-pasted from the existing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jseabold: Sorry, my bad. I see now that that code was added due to
GH #1615. Unfortunately, there appears to be a regression when
factors_as_strings=False
:
from pandas.rpy.common import load_data
prestige = load_data('Prestige', 'car', factors_as_strings=False)
raises ValueError: codes need to be between -1 and len(categories)-1
.
And the issue also affects factors_as_strings=True
, where
prestige = load_data('Prestige', 'car', factors_as_strings=True)
raises IndexError: index 2147483647 is out of bounds for axis 0 with size 3
A use case for this PR: http://stackoverflow.com/q/28353937/190597 |
let's redirect to |
No description provided.