ENH: Handle categorical dtype to/from R #9187

jseabold · 2015-01-02T19:55:15Z

No description provided.

jseabold · 2015-01-02T22:22:53Z

Failure is unrelated and why didn't skip-ci work?

shoyer · 2015-01-04T20:08:15Z

Is using the pandas interface the preferred way to do pandas/R dataframe conversion, or should we leave this to rpy2 now that it has built in support for pandas? I've had pretty good luck with using pandas2ri.activate() from rpy2, which I believe entirely skips the conversion machinery in pandas itself. At the very least, I suspect a documentation update is due.

@lgautier any input?

For reference, here is where you'll find the pandas conversion code in rpy2: https://bitbucket.org/rpy2/rpy2/src/f1f15fe7eb6d1d65e8b3de4ee7fd697b800f001e/rpy/robjects/pandas2ri.py?at=default

BTW, I restarted the failed travis job -- we'll see if it work this time.

lgautier · 2015-01-04T21:06:52Z

I am adding the following comment so you (pandas) can make the your decision with data.

rpy2 has integrated a two-way conversion of pandas DataFrame to rpy2 DataFrame (proxy of data.frame in the embedded R) since version 2.4.0 (current release is 2.5.4) and there is currently no plan to drop support for it: rmagic, originally in ipython and now part of rpy2, appears to rely on pandas for a prime representation of data frames in Python (@davclark can comment or add anything I'd have omitted about this).

While the conversion in rpy2 is covered by unit tests and is thought to be practical and well behaved, rpy2 is modular and relatively easy to customize by design. Alternative conversions, built from scratch or built on the top of the existing ones, can be added (see http://rpy.sourceforge.net/rpy2/doc-2.5/html/robjects_convert.html?highlight=register#customizing-the-conversion)

jseabold · 2015-01-05T15:33:49Z

I don't have a strong opinion on this, though I suspect any "fixes" should definitely make it in to rpy2. As a prior though, I assume pandas development moves a bit faster, but that's mostly a guess. My mercurial/bitbucket-fu is also super weak (so far), so it's easier for me to make a pandas PR :)

jorisvandenbossche · 2015-01-05T23:23:56Z

Related discussions in #7309 and #7385

jreback · 2015-01-05T23:47:32Z

@jseabold it does not appear to me that any of this is actually tested on travis (it is all skipped), though it doesn't even print the skip message.

So ok with putting this in if you can confirm that it works locally for you?

cc @unutbu can you give a check?

lgautier · 2015-01-06T00:01:06Z

@jseabold

As a prior though, I assume pandas development moves a bit faster, but that's mostly a guess.

Ya ya pandas is more and better in any possible and conceivable way [yawn]
;-)

My mercurial/bitbucket-fu is also super weak (so far), so it's easier for me to make a pandas PR :)

1- Go to https://bitbucket.org/rpy2/rpy2
2- Click "Fork" (left side, 5th icon)
3- Edit in the web browser
4- Submit a pull request

unutbu · 2015-01-06T13:54:50Z

pandas/rpy/common.py

+        levels = np.asarray(obj.levels)
+        values = np.asarray(obj)
+        if com.is_float_dtype(values):
+            mask = np.isnan(values)


@jseabold: I don't see a test exercising this code path (when is_float_dtype(values) is True). Can you explain its purpose and/or perhaps add a test?

It was copy-pasted from the existing code.

@jseabold: Sorry, my bad. I see now that that code was added due to
GH #1615. Unfortunately, there appears to be a regression when
factors_as_strings=False:

from pandas.rpy.common import load_data prestige = load_data('Prestige', 'car', factors_as_strings=False)

raises ValueError: codes need to be between -1 and len(categories)-1.

And the issue also affects factors_as_strings=True, where

prestige = load_data('Prestige', 'car', factors_as_strings=True)

raises IndexError: index 2147483647 is out of bounds for axis 0 with size 3

unutbu · 2015-02-05T21:38:11Z

A use case for this PR: http://stackoverflow.com/q/28353937/190597

jreback · 2015-03-08T14:39:29Z

let's redirect to rpy2 for these types of conversions, see #9602

jseabold added 6 commits January 2, 2015 12:31

TST: Make test work.

2a63579

TST: Test DataFrame code path for factors

0b3327f

ENH: Convert R factors to pandas.Categoricals

0d75807

TST: Test to R data.frame with categorical

57cf1e7

ENH: Handle to R data.frame with Categorical

5f4db39

ENH: Support int8 and int16. [skip-ci]

620b5ed

jreback added the Categorical Categorical Data Type label Jan 5, 2015

unutbu reviewed Jan 6, 2015
View reviewed changes

sinhrks mentioned this pull request Jan 31, 2015

ENH: automatic rpy2 instance conversion #7385

Closed

jreback mentioned this pull request Mar 6, 2015

DEPR/DOC: rpy deprecation #9602

Closed

5 tasks

jreback closed this Mar 8, 2015

jorisvandenbossche added this to the No action milestone Mar 8, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Handle categorical dtype to/from R #9187

ENH: Handle categorical dtype to/from R #9187

jseabold commented Jan 2, 2015

jseabold commented Jan 2, 2015

shoyer commented Jan 4, 2015

lgautier commented Jan 4, 2015

jseabold commented Jan 5, 2015

jorisvandenbossche commented Jan 5, 2015

jreback commented Jan 5, 2015

lgautier commented Jan 6, 2015

unutbu Jan 6, 2015

jseabold Jan 9, 2015

unutbu Jan 11, 2015

unutbu commented Feb 5, 2015

jreback commented Mar 8, 2015

ENH: Handle categorical dtype to/from R #9187

ENH: Handle categorical dtype to/from R #9187

Conversation

jseabold commented Jan 2, 2015

jseabold commented Jan 2, 2015

shoyer commented Jan 4, 2015

lgautier commented Jan 4, 2015

jseabold commented Jan 5, 2015

jorisvandenbossche commented Jan 5, 2015

jreback commented Jan 5, 2015

lgautier commented Jan 6, 2015

unutbu Jan 6, 2015

Choose a reason for hiding this comment

jseabold Jan 9, 2015

Choose a reason for hiding this comment

unutbu Jan 11, 2015

Choose a reason for hiding this comment

unutbu commented Feb 5, 2015

jreback commented Mar 8, 2015