You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ifhasattr(xgb_node, 'missing') andnotnp.isnan(xgb_node.missing):
raiseRuntimeError("Cannot convert a XGBoost model where missing values are not ""nan but {}.".format(xgb_node.missing))
Even when I initialize SparkXGBClassifier(missing=np.nan), this check still fails
TypeError: ufunc 'isnan' not supported for the input types,
and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Upon inspecting, xgb_node.missing is type pyspark.ml.param.Param so it makes sense that numpy can't apply the function. @xadupre can you provide some context on this change? I couldn't find much in the PR or linked issues but it seems like this is missing something like Param.value to access the data before passing to numpy
Example
Reproducible script using the following library versions
I saw your comments about sparsity, so I understand the motivation but it seems like the implementation has a bug. Do you recall any tests that were able to pass this check?
I'm not particularly familiar with pyspark, but it seems we cannot operate directly on the Param type without numpy complaining
After researching, the fix would be accessing via xgb_node.getOrDefault("missing") instead of directly using xgb_node.missing. Let me submit a PR with changes
addisonklinke
changed the title
Spark XGB converter fails to recognize missing param
Spark XGB converter accesses missing param improperly
Jul 2, 2024
Description
#373 introduced this constraint on the converter
Even when I initialize
SparkXGBClassifier(missing=np.nan)
, this check still failsUpon inspecting,
xgb_node.missing
is typepyspark.ml.param.Param
so it makes sense that numpy can't apply the function. @xadupre can you provide some context on this change? I couldn't find much in the PR or linked issues but it seems like this is missing something likeParam.value
to access the data before passing to numpyExample
Reproducible script using the following library versions
The text was updated successfully, but these errors were encountered: