Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contributions mismatch for nominal features #16

Closed
lboussengui opened this issue Jun 4, 2024 · 3 comments · Fixed by #19
Closed

contributions mismatch for nominal features #16

lboussengui opened this issue Jun 4, 2024 · 3 comments · Fixed by #19
Assignees
Labels
bug Something isn't working

Comments

@lboussengui
Copy link

  • ebm2onnx version: 3.1.1
  • onnxruntime : 1.16.1
  • interpret : 0.4.2
  • Python version: 3.10.8
  • Operating System: MacOS

Description

I trained an EBM classification model. This model was initially saved in pickle format.

I used ebm2onnx as shown below to convert my model to the .onnx format.

I noticed that the contribution to the prediction for a test case is different for nominal type features when passing in onnx format; the contributions are set to zero.

Do you have an explanation for this ?

What I Did

import ebm2onnx
import pickle
import onnxruntime as rt

# load first EBM 
with open(f'{MODEL_PATH}ebm_first.pkl', 'rb') as f:
    ebm_first  = pickle.load(f)

# load dtypes saved during model training 
with open(f'{MODEL_PATH}training_dtypes_for_onnx.pkl', 'rb') as f:
    training_dtypes_for_onnx  = pickle.load(f)

# transform ebm to onnx 
onnx_model = ebm2onnx.to_onnx(
    model=ebm_first,
    predict_proba=True,  # Generate a dedicated output for probabilities
    explain=True,  # Generate a dedicated output for local explanations
    dtype=training_dtypes_for_onnx,
    name='DEFAULT',
)

Here are the result of local explanation with EBM pickle model for one example :

pred_pkl = ebm_first.explain_local(X_test, y_test)
pred_pkl.data(0)['scores']

result is

array([ 0.027,  0.416, -0.158,  0.388,  0.043,  0.   , -0.196,  0.051,
       -0.201, -0.032,  0.176,  0.151,  0.   ,  0.216,  0.2  ,  0.376,
        0.05 ,  0.022, -0.076,  0.028, -0.26 , -0.043,  0.173,  0.269,
       -0.203, -0.025,  0.037, -0.056,  0.164,  0.296,  0.089,  0.08 ,
        0.1  ,  0.098, -0.018, -0.002, -0.001, -0.001, -0.003, -0.002])

After transforming ebm_first to onnx_model i did the following to imitate inference in production:

onnx_model.ir_version = 9
ebm_onnx = rt.InferenceSession(onnx_model.SerializeToString())
pred_onnx = ebm_onnx.run(None, X_test.to_dict("list"))

# contributions of pred_onnx 
pred_onnx[2][0][:, 0]

result is

array([ 0.027,  0.416, -0.158,  0.388,  0.   ,  0.   , -0.196,  0.051,
       -0.201, -0.032,  0.176,  0.151,  0.   ,  0.216,  0.2  ,  0.376,
        0.05 ,  0.022, -0.076,  0.028, -0.26 , -0.043,  0.173,  0.269,
       -0.203, -0.025,  0.037, -0.056,  0.164,  0.296,  0.089,  0.08 ,
        0.1  ,  0.098, -0.018, -0.002, -0.001, -0.001, -0.003, -0.002],
      dtype=float32)

The two arrays are not equal in index 4 and 5; the only nominal features of the dataset.

@MainRo MainRo added the bug Something isn't working label Jun 20, 2024
@MainRo MainRo self-assigned this Jun 20, 2024
@MainRo
Copy link
Collaborator

MainRo commented Jun 20, 2024

Is it possible for you to publish here a model and sample utterance that reproduces the issue?
In the meantime I will look at reproducing this in a unit test

@MainRo
Copy link
Collaborator

MainRo commented Jul 25, 2024

Could you confirm that the nominal features are of type boolean?
If this is the case, then can you try to explicitly convert them to 0/1 before calling ebm_first.explain_local:

X_test['feature'] = np.where(X_test['feature'] == False, 0, 1)

I suspect an issue in the interpret explain_local implementation. It looks like the boolean features are not correctly mapped, and have scores of 0.0.

@MainRo
Copy link
Collaborator

MainRo commented Jul 25, 2024

ok forget my last comment, the problem is that the conversion to onnx mutates the ebm model object.
if you call ebm_first.explain_local before converting to onnx you will have the same values.

obviously, this is not a normal behavior of the converter. I will fix this.

MainRo added a commit that referenced this issue Jul 25, 2024
This prevents from using the ebm model correctly after the conversion.
Fixes #16
MainRo added a commit that referenced this issue Jul 25, 2024
This prevents from using the ebm model correctly after the conversion.
Fixes #16

Signed-off-by: Romain Picard <[email protected]>
@MainRo MainRo closed this as completed in c4bf7ed Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging a pull request may close this issue.

2 participants