-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support to sklearn TargetEncoder #1137
Conversation
Thanks for the contribution. One line should be removed. Everything else looks good. |
[("input", StringTensorType([None, X.shape[1]]))], | ||
target_opset=TARGET_OPSET, | ||
) | ||
self.assertTrue(model_onnx is not None) |
Check notice
Code scanning / CodeQL
Imprecise assert Note test
target_opset=TARGET_OPSET, | ||
) | ||
self.assertTrue(model_onnx is not None) | ||
self.assertTrue(model_onnx.graph.node is not None) |
Check notice
Code scanning / CodeQL
Imprecise assert Note test
[("input", Int64TensorType([None, X.shape[1]]))], | ||
target_opset=TARGET_OPSET, | ||
) | ||
self.assertTrue(model_onnx is not None) |
Check notice
Code scanning / CodeQL
Imprecise assert Note test
target_opset=TARGET_OPSET, | ||
) | ||
self.assertTrue(model_onnx is not None) | ||
self.assertTrue(model_onnx.graph.node is not None) |
Check notice
Code scanning / CodeQL
Imprecise assert Note test
model_onnx = convert_sklearn( | ||
model, "ordinal encoder two string cats", inputs, target_opset=TARGET_OPSET | ||
) | ||
self.assertTrue(model_onnx is not None) |
Check notice
Code scanning / CodeQL
Imprecise assert Note test
Signed-off-by: boccaff <[email protected]>
Signed-off-by: boccaff <[email protected]>
Signed-off-by: boccaff <[email protected]>
e945a85
to
e243865
Compare
Thanks for the comments @xadupre. I've removed the line, and solved a couple of the CodeQL suggestions (removed an unused import and an unused variable). The rest of the CodeQL suggested changes would diverge from the other implementations. For the |
Signed-off-by: boccaff <[email protected]>
Signed-off-by: boccaff <[email protected]>
The PR looks ok. I'll merge it when the CI test pass. |
You should write |
@xadupre , I've noticed that the converter for the PS: should we also raise on OrdinalEncoder? I could follow up on a different issue. |
2 or 3 should not matter. If you set 3, onnx won't find any version for opset 3 so it chooses the latest available, so 2. |
Signed-off-by: boccaff <[email protected]>
@@ -419,3 +419,48 @@ def _run(self, x, int64_vocabulary=None, string_vocabulary=None): | |||
return (np.array(res),) | |||
|
|||
raise TypeError(f"x must be iterable not {type(x)}.") # pragma: no cover | |||
|
|||
class TargetEncoder(OpRun): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be needed. TargetEncoder is not part of onnx specs and your converter does not produce such a node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed that and the reference to it on tests/test_utils/utils_backend_onnx.py
. Tks for the review.
Signed-off-by: boccaff <[email protected]>
Signed-off-by: boccaff <[email protected]>
Signed-off-by: boccaff <[email protected]>
py 3.12 and sklearn==1.4.2 appears ok, 1.5.2 also and 1.6.2. |
According to this https://scikit-learn.org/1.3/modules/generated/sklearn.preprocessing.TargetEncoder.html, the attribute does not exist in that version. You can either choose to retreive the information from another attribute or say the converter only works after sklearn>=1.4. Since it is a new converter, I don't see any issue with that. We may address older versions if other users post an issue on github. So feel free to raise an exception if the attribute is missing giving the information the user needs to understand why it does not work. |
The attribute classes was introduced later on 1.4. Changed the test to not rely on it. Signed-off-by: boccaff <[email protected]>
Opted for checking only |
Scikit 1.3.2 does not allow multiclass to be fitted. Signed-off-by: boccaff <[email protected]>
Signed-off-by: boccaff <[email protected]>
thank you for the support and patience through this PR @xadupre |
This PR implements a converter and a shape calculator for the TargetEncoder class introduced in Scikit-learn 1.5. The code follows much of the implementation of the converter for Ordinal Encoder.
A partial suit of tests is already implemented, but there is at least a couple of additional tests that I would like to add (missing values and using the smooth parameter from sklearn, even though I think it shouldn't matter).