You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am seeing that, due to rounding errors, the score calculated by xgboost for a tree ensemble does not match the one expected based on manually looking at the tree model.
Background
We have our own implementation of tree scoring and while comparing the score that our library generates for a given tree model with the score (prediction) that xgboost comes up with for the same tree model we find that due to rounding errors somewhere the tree traversal the score done match
Using some training data we trained a tree ensemble using xgboost and the xgboost outputs the following model
Xgboost score = 0.401647
Score with our own library = 0.4295020650820223
(NOTE: scores of individual trees are added (summation) and score = sigmod(sum of scores from each tree))
For the line marked different branching one can deduce that our library is evaluating the condition to be false, and hence ends up calculating score -0.0398532 as the score of the third tree.
Based on the score generated by xgboost one can deduce that this same condition is evaluated as true by xgboost and xgboost ends up calculating score -0.154575 as the score of the third tree.
The text was updated successfully, but these errors were encountered:
which usually means that the float split values are represented with 6 meaningful digits, as you may see from your example. And default rounding, if I remember correctly, is towards zero.
You might try the following hack in order to see more digits in the split value: add fo.precision(18); after that line and rebuild.
Thanks @khotilov. That helped. Would it be a good idea to set the precision to the highest precision of float in xgboost so as to avoid discrepancies in what xgboost uses as splitvalue/score and what other libraries using the output of xgboost use during scoring. If so, I can create a pull request
The same would be required for prediction value as well (
TL;DR
I am seeing that, due to rounding errors, the score calculated by xgboost for a tree ensemble does not match the one expected based on manually looking at the tree model.
Background
We have our own implementation of tree scoring and while comparing the score that our library generates for a given tree model with the score (prediction) that xgboost comes up with for the same tree model we find that due to rounding errors somewhere the tree traversal the score done match
Using some training data we trained a tree ensemble using xgboost and the xgboost outputs the following model
TREE MODEL
For this test data
TEST DATA
Xgboost score =
0.401647
Score with our own library =
0.4295020650820223
(NOTE: scores of individual trees are added (summation) and
score = sigmod(sum of scores from each tree)
)For the line marked different branching one can deduce that our library is evaluating the condition to be false, and hence ends up calculating score
-0.0398532
as the score of the third tree.Based on the score generated by xgboost one can deduce that this same condition is evaluated as true by xgboost and xgboost ends up calculating score
-0.154575
as the score of the third tree.The text was updated successfully, but these errors were encountered: