You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reorder the columns of a pandas dataframe during prediction will result in inconsistent result if that columns order is different from what was used in training. Lightgbm python API doesn't seem to recognize the column names of pandas dataframes.
Reproducible example
importnumpyasnpimportpandasaspdimportlightgbmaslgbdf=pd.DataFrame(
[[0, 1, 1]] *24+ [[0, 1, 0]] *24+ [[1, 0, 0]] *21+ [[1, 0, 1]] *21,
columns=['y', 'x1', 'x2'],
)
cols_feat= ['x1', 'x2']
y_true=df['y']
lgb_train=lgb.Dataset(df[cols_feat], y_true)
model=lgb.train({}, lgb_train)
# original order of columnsy_pred=model.predict(df[cols_feat])
print(y_pred)
# [1.23953194e-05, ..., 9.99985834e-01, ...]print('loss:', np.mean((y_pred-y_true)**2))
# loss: 1.7559308212239284e-10# reverse the order of columnsy_pred2=model.predict(df[cols_feat[::-1]])
print(y_pred2)
# [1.23953194e-05, ..., 9.99985834e-01, ..., 1.23953194e-05. ...]print('loss:', np.mean((y_pred2-y_true)**2))
# loss: 0.49998666045229395, much worse than the original order
Environment info
LightGBM version or commit hash: 2.3.1
Command(s) you used to install LightGBM
pip3 install lightgbm
Additional Comments
Not sure if this is a bug or should be a feature request. Since Lightgbm model clearly know the feature name (model.feature_name()) but the prediction stage doesn't check this, which can lead to some confusion, it will be great if lightgbm can check the order of columns for prediction. Thanks!
The text was updated successfully, but these errors were encountered:
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.
Description
Reorder the columns of a pandas dataframe during prediction will result in inconsistent result if that columns order is different from what was used in training. Lightgbm python API doesn't seem to recognize the column names of pandas dataframes.
Reproducible example
Environment info
LightGBM version or commit hash: 2.3.1
Command(s) you used to install LightGBM
Additional Comments
Not sure if this is a bug or should be a feature request. Since Lightgbm model clearly know the feature name (
model.feature_name()
) but the prediction stage doesn't check this, which can lead to some confusion, it will be great if lightgbm can check the order of columns for prediction. Thanks!The text was updated successfully, but these errors were encountered: