-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] Fix error when passing categorical features to lightgbm() (fixes #6000) #6003
[R-package] Fix error when passing categorical features to lightgbm() (fixes #6000) #6003
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for identifying this issue and taking the time to submit a PR! But I think we should pursue a different fix... setting free_raw_data=FALSE
this way means unnecessarily storing a copy of the passed-in data
throughout training, which might cause out-of-memory issues for users.
Did you explore simply moving this call
LightGBM/R-package/R/lgb.train.R
Lines 182 to 185 in 44928d3
# Write categorical features | |
if (!is.null(categorical_feature)) { | |
data$set_categorical_feature(categorical_feature) | |
} |
up further in lgb.train()
, so it's run prior to the Dataset being constructed here?
LightGBM/R-package/R/lgb.train.R
Line 157 in 44928d3
data$construct() |
I haven't tested that yet, but I think it should solve the issue without requiring changes to lightgbm()
or holding an extra copy of the training data in memory. Could you please try that?
Also don't worry about the CUDA CI job failures... we have a repo-wide issue with those jobs right now: #6001 Sorry for the inconvenience. |
Yes, that also seems to fix the issue. Updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed that this test produces the error from #6000 on latest master
, and that it's resolved here.
Thanks for the help as always!
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
fixes #6000
This PR fixes an error when supplying dataset parameters to
lightgbm()
, such ascategorical_feature
. Before this PR, the dataset was constructed withfree_raw_data=TRUE
, which impeded it from using parameters that require the raw data after dataset creation.