Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUBDEV-4639: Added requirement for data.table. #4265

Merged
merged 14 commits into from
Feb 27, 2020

Conversation

wendycwong
Copy link
Contributor

This PR enable data.table by default as specified by JIRA: https://0xdata.atlassian.net/browse/PUBDEV-4639.

@wendycwong wendycwong requested a review from ledell January 28, 2020 02:36
Copy link
Contributor

@honzasterba honzasterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably change the default value fo use arg

@wendycwong
Copy link
Contributor Author

@honzasterba

Thanks for the review. I don't understand what that means. Please help? W

@wendycwong wendycwong force-pushed the pubdev-4639-use-data-table-rel-yu branch from 2f46e16 to fe16bb5 Compare January 29, 2020 17:57
@wendycwong
Copy link
Contributor Author

Honza: I have added the change in h2o-3-DESCRIPTION.template. Let me know if that is the right place. If not, please point me to the right file.

@honzasterba
Copy link
Contributor

@wendycwong no, I think we want to keep data.table in suggest since its not required, the data.table support should only enable itself when data.table is installed

@wendycwong
Copy link
Contributor Author

@honzasterba Can you tell me exactly what I need to change then. Thanks. W

@wendycwong wendycwong force-pushed the pubdev-4639-use-data-table-rel-yu branch from fe16bb5 to 7b72d37 Compare January 30, 2020 21:43
@honzasterba
Copy link
Contributor

I change to DESCRIPTION should be reverted as I see you did.
Also I added a commit to this PR but it is gone so I added the change again.

@wendycwong
Copy link
Contributor Author

Honza: Thanks for the help. Do you think this PR is ready to go? Or more changes are needed? I have been testing it continuously and it seems to work fine. Let me know. Thanks, Wendy

Copy link
Contributor

@honzasterba honzasterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the tests are green I think its fine

@honzasterba
Copy link
Contributor

maybe check if there is some docs that need updating @angela0xdata

@wendycwong wendycwong added the do not merge For PRs that are not supposed to be merged label Feb 3, 2020
@wendycwong
Copy link
Contributor Author

wendycwong commented Feb 3, 2020

@honzasterba: There are two runits that are failing in RSmoke test. That needs to be resolved before this PR can merge.

@wendycwong wendycwong force-pushed the pubdev-4639-use-data-table-rel-yu branch 3 times, most recently from bfa0de1 to 5a4c85e Compare February 12, 2020 17:48
@michalkurka
Copy link
Contributor

Needs fixing before merging:

[2020-02-13T01:04:38.981Z] * checking R code for possible problems

[2020-02-13T01:04:38.981Z] use.package: no visible global function definition for

[2020-02-13T01:04:38.981Z]   ‘installed.packages’

[2020-02-13T01:04:38.981Z] Undefined global functions or variables:

[2020-02-13T01:04:38.981Z]   installed.packages

[2020-02-13T01:04:38.981Z] Consider adding

[2020-02-13T01:04:38.981Z]   importFrom("utils", "installed.packages")

AFAIK @ledell offered to help with that

@wendycwong wendycwong force-pushed the pubdev-4639-use-data-table-rel-yu branch from 3ef138a to ec3c3e8 Compare February 14, 2020 17:08
@honzasterba
Copy link
Contributor

I have pushed a fix for the CRAN issue and also the actual change of the default value for data.table use.
@mattdowle could you please take a look at the test failures?

@wendycwong wendycwong force-pushed the pubdev-4639-use-data-table-rel-yu branch from 333e7d3 to 7d17920 Compare February 19, 2020 19:00
@wendycwong wendycwong force-pushed the pubdev-4639-use-data-table-rel-yu branch from fef748f to 2fa26f1 Compare February 20, 2020 17:19
@mattdowle
Copy link
Contributor

@honzasterba The PR I submitted #4308 does not appear to have been merged yet @wendycwong @ledell ?

@honzasterba
Copy link
Contributor

@mattdowle I believe wendy cherry picked your commit into this PR, I am now working on fixing the rest of the tests

@mattdowle
Copy link
Contributor

mattdowle commented Feb 21, 2020

@honzasterba Ah ok I see the cherry pick now.
I was looking at the error in example(as.h2o) and I see your recent commit. But I see fail again just now :

Error in .subset(x, j) : invalid subscript type 'list'
[2020-02-21T09:50:30.146Z] Calls: as.h2o ... .h2o.calc_stm_svm -> [ -> [.data.table -> [.data.frame
[2020-02-21T09:50:30.146Z] Execution halted

Did you write earlier above in this PR that data.table is downgraded to suggests in DESCRIPTION? If so that will affect using data.table syntax. It seems that as.h2o.Matrix is using data.table syntax. For that to work data.table needs to know h2o is data.table-aware.
https://stackoverflow.com/a/10529888

There is a flag that can be set in h2o's NAMESPACE file (.datatable.aware) that would allow h2o to use data.table syntax while still only suggesting it. We haven't had the situation before that a package (h2o in this case) wants to use data.table syntax itself, but only optionally depending on that package's user.

Options that spring to mind are :

  1. Remove the use of data.table syntax within h2o package itself. Just use data.table for read and write.
  2. h2o import data.table. But that would require users to install data.table which I understand you'd prefer not to.
  3. Set .datatable.aware.

However, with both options 2 and 3, if a user passes a data.table to a function in h2o that is really expecting a data.frame (i.e. it is using data.frame syntax), then unexpected behaviour could occur. When a package states it is data.table-aware, it really has to be fully aware throughout all exported functions. Packages which accept data.frame from a user, but want to use data.table syntax, should have an as.data.table() call at the start of each user-visible function that use data.table syntax, and an as.data.frame() call at the start of each exported function that uses data.frame syntax.

I think option 1 is by far the safest option in the case of h2o package.

@honzasterba
Copy link
Contributor

yes, I have fixed the data-table aware issue yesterday, but it seems that @wendycwong did another rebase afterwards and my commit is lost :(

@honzasterba honzasterba self-requested a review February 21, 2020 10:20
@wendycwong
Copy link
Contributor Author

Sorry guys. I will take my hands off this PR now.

@mattdowle
Copy link
Contributor

mattdowle commented Feb 22, 2020

This branch passes R CMD check locally for me. When running on Jenkins it seems to be running example(as.h2o) even though that code is inside a \dontrun{ ... } section. Has anyone seen that before?
Anyway, I guess something is testing donotrun blocks then, I managed to reproduce locally by running that donotrun manually...

@honzasterba honzasterba removed their request for review February 24, 2020 08:55
@honzasterba honzasterba self-assigned this Feb 24, 2020
@honzasterba
Copy link
Contributor

@mattdowle @ledell please review, I have decided to use .datatable.aware since the only place we should get in touch with data.frames is as.h2o, any other concerns?

@@ -5,7 +5,9 @@ test.pubdev_2844 <- function() {

df1 <- iris
h2o.no_progress()

# enable using of data.table for as.data.frame
options(h2o.fread=TRUE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this also enable it for the rest of the test suite? or is it just this test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it probably does, will fix

@honzasterba honzasterba added please review and removed do not merge For PRs that are not supposed to be merged labels Feb 27, 2020
@michalkurka
Copy link
Contributor

@honzasterba looks good to me, but please change the target branch to master and lets target 3.30 for this feature

@honzasterba honzasterba changed the base branch from rel-yu to master February 27, 2020 17:58
@honzasterba honzasterba force-pushed the pubdev-4639-use-data-table-rel-yu branch from 973f02f to 559fafa Compare February 27, 2020 17:58
@honzasterba honzasterba merged commit 3a799f0 into master Feb 27, 2020
@honzasterba honzasterba deleted the pubdev-4639-use-data-table-rel-yu branch February 27, 2020 20:14
@@ -3530,6 +3530,8 @@ as.data.frame.H2OFrame <- function(x, ...) {
# Versions of R prior to 3.1 should not use hex string.
# Versions of R including 3.1 and later should use hex string.
useHexString <- getRversion() >= "3.1"
# We cannot use data.table by default since its handling of escaping inside quoted csv values is not very good
# in some edge cases its simply impossible to load data in correct format without additional post processing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What specifically do you mean by "not very good"? This is unhelpful to write in this way. Tell me what's wrong and I'll fix it. Probably already known somewhere and I'll raise the priority. We're on the same team here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being so ambiguous. On second thought I should have been more to the point and link to existing issue.
This is the issue I was reffering to:
This is the issue: Rdatatable/data.table#1109
Here is the test that I was not able to make pass with data.table even with changes on the back-end: https://github.com/h2oai/h2o-3/pull/4265/files#diff-c87a495277fa01b186f5300c48973bf6R33

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants