Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Not all operators have been evaluated. A variable name is probably misspelled. #817

Closed
ebolotin6 opened this issue Jan 28, 2022 · 9 comments

Comments

@ebolotin6
Copy link

Hello,

I have the following sklearn pipeline with a StackingClassifier that uses 2 XGB classifiers (stored in a dict) as estimators:

stacking_ensemble = StackingClassifier(
        estimators=list(map(tuple, classifiers.items())),
        stack_method='predict_proba',
        passthrough=False
    )

pipeline = Pipeline(steps=[
    ('cbe', ColTransformer()),
    ('sc', stacking_ensemble),
])
pipeline.fit(x_train, y_train)

If I try to convert stacking_ensemble to onnx on it's own - it works.
If I try to convert ColTransformer to onnx on it's own - it works.
If I try to convert a sklearn pipeline with ColTransformer and any other sklearn model (including ensemble models like voting classifier) - it works.

However when I try to convert the above pipeline (specifically with a StackingClassifier) to onnx, I get this:
RuntimeError: Not all operators have been evaluated. A variable name is probably misspelled.

With the only operator being is_eval=None as this one:

Operator(type='SklearnLinearClassifier', onnx_name='SklearnLinearClassifier', inputs='merged_probability_tensor', outputs='label3,probability_tensor2', raw_operator=LogisticRegression())

This operator is the final_estimator in the StackingClassifier per here, which defaults to a LogisticRegression classifier.

Do you know what the problem might be? Any help is greatly appreciated.

Thank you very much,
EB

@xadupre
Copy link
Collaborator

xadupre commented Feb 1, 2022

Which version of scikit-learn are you using?

@xadupre
Copy link
Collaborator

xadupre commented Feb 1, 2022

I tried to cover your example by adding two unit tests but it did not fail for me. What are the differences between your model and the ones I added in PR #820?

@ebolotin6
Copy link
Author

ebolotin6 commented Feb 1, 2022

Hello, huge thanks for your reply!

I'm using sklearn version: 1.0.1.

Attached is the graph of the pipeline. But first, let me clarify the pipeline from above:

pipeline = Pipeline(steps=[
    ('cbe', ColTransformer()),
    ('sc', stacking_ensemble),
])

In the above, ColTransformer is a transformer used for converting a dataframe or numpy matrix of mixed types (strings, ints, floats) into the same shaped output of floats. Its specific name (in the attached graph) is CatColEncoder and its main purpose is for encoding categorical columns. Some notes:

  • The output shape of the CatColEncoder is the same as its input shape.
  • The type of the input data to CatColEncoder can be a matrix of mixed types (strings, floats, ints), and the output type of CatColEncoder will be a float matrix.

Regarding the attached:

On line 122 is this: Identity: ['index1'] -> ['column_index'].

  • If you look at the graph, index1 is the first label-encoded column of CatColEncoder
    • This appears out of nowhere it seems (?)
  • The variable name column_index is defined on line 74 of stacking.py (inside operator_converters dir):
column_index_name = scope.get_unique_variable_name('column_index')
  • However, the identity operation itself is defined on line 28 of pipelines.py:
    for fr, to in zip(outputs, operator.outputs):
        container.add_node(
            'Identity', fr.full_name, to.full_name,
            name=scope.get_unique_operator_name("Id" + operator.onnx_name))

  • There is another identity operation in stacking.py but ironically, I don't think it's responsible this operation: Identity: ['index1'] -> ['column_index']

I'm at a bit of a loss, not sure where this error is originating or what's causing the graph to break. The output of CatColEncoder successfully flows through every step of the pipeline - and conversion to onnx is successful when a voting classifier is used instead of a stacking classifier.

Any thoughts, hints, suggestions are very appreciated.

Thank you,
EB

@xadupre
Copy link
Collaborator

xadupre commented Feb 2, 2022

I'm puzzled. I tried to use a dataframe as an input but it still works (see last commit in the PR). And if ColTransformer is a custom transformer, the conversion should have failed telling there is no converter for this class unless you did. How do you convert the model?

@ebolotin6
Copy link
Author

I'm puzzled too, and I really want to get this to work. Attached is a zip file that contains a demo notebook that you can run. The converter is inside cce_onnx_converter.py.

catcol_demo.zip

Thanks again,
EB

@xadupre
Copy link
Collaborator

xadupre commented Feb 4, 2022

I tried your notebook but nothing fails for me. I then tried the following pipeline but still no failure. I did not find any model with StackingClassifier. Did I miss something?

pipeline = Pipeline(steps=[
    ('cbe', CatColEncoder(all_col_names=x_df.columns)),
    ('nan', SimpleImputer()),
    ('sc', LogisticRegression()),
])
pipeline.fit(x_df, y)

@ebolotin6
Copy link
Author

Hi, thanks for responding! Attached is a new notebook that fully demonstrates the bug (titled sc_bug.ipynb). The traceback is at the bottom. Notice 2 things from the traceback graph:

  • the final operator is set to is_eval=None
  • This identity appears out of nowhere (or atleast unintentionally): Identity: ['index1'] -> ['column_index']

And to reiterate: the problem is specifically with a pipeline that contains the StackingClassifier (other classifiers work fine):

catcol_demo.zip

Much appreciated,
EB

@xadupre
Copy link
Collaborator

xadupre commented Feb 9, 2022

I was finally able to replicate the bug and found the cause. I updated the PR to fix the bug. I should release a new version by the end of week.

xadupre added a commit that referenced this issue Feb 9, 2022
* investigate an issue with StackingClassifier
* fix issue 817
@ebolotin6
Copy link
Author

Excellent, glad that you've solved it!! I look forward to the next version and will test again!

Thanks much,
EB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants