[Question]: Load quantized model from onnx format. #3226

SylvainVerdy · 2023-04-28T15:46:30Z

Question

Hi,

I have severall questions concerning onnx models and quantization. I tried to export to onnx my models. I succed into save it as an onnx format.

model = SequenceTagger.load("./exps/camembert-large/models/NER/Flair/taggers/sota-ner-flair/best-model.pt").cpu()
model.embeddings = model.embeddings.export_onnx("flert-embeddings.onnx", sentences, providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
quantize = True
onnx_optimize = True
if quantize:
    model.embeddings.quantize_model(
        "flert-quantized-embeddings.onnx", extra_options={"DisableShapeInference": True}
    )

if onnx_optimize:
    model.embeddings.optimize_model(
        "flert-optimized-embeddings.onnx", opt_level=2, use_gpu=False, only_onnxruntime=True
    )

First question,
Is it normal to find Ignore MatMul due to non constant B : /[/model/encoder/layer1../attention/self/MatMul..], when i tryed to quantize my model?
Now,
I'm trying to load my model, in SequenceTagger.load().
Do I need to save at the end of the code above my model in .pt to load into SequenceTagger to use TransformerOnnxWordEmbeddings class?

Do you have any example of loading onnx files in Inference to evaluate a corpus or several sentences?

Thanks a lot for your work!

The text was updated successfully, but these errors were encountered:

helpmefindaname · 2023-05-03T17:05:07Z

Hi @SylvainVerdy

Is it normal to find Ignore MatMul due to non constant B : /[/model/encoder/layer1../attention/self/MatMul..], when i tryed to quantize my model?

Yes, that warning is normal. I cannot tell you why exactly that happens or what it means, but it happens on all hugging face models that I tested so far.

Do I need to save at the end of the code above my model in .pt to load into SequenceTagger to use TransformerOnnxWordEmbeddings class?

Yes, as stated in the tutorial, you need to save the model again and keep using the new model.

Do you have any example of loading onnx files in Inference to evaluate a corpus or several sentences?

There are no examples specific to models that contain onnx-embeddings, as you still need to use the model the same way as before.

stale · 2023-09-17T01:26:53Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

SylvainVerdy added the question Further information is requested label Apr 28, 2023

stale bot added the wontfix This will not be worked on label Sep 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Load quantized model from onnx format. #3226

[Question]: Load quantized model from onnx format. #3226

SylvainVerdy commented Apr 28, 2023

helpmefindaname commented May 3, 2023 •

edited

Loading

stale bot commented Sep 17, 2023

[Question]: Load quantized model from onnx format. #3226

[Question]: Load quantized model from onnx format. #3226

Comments

SylvainVerdy commented Apr 28, 2023

Question

helpmefindaname commented May 3, 2023 • edited Loading

stale bot commented Sep 17, 2023

helpmefindaname commented May 3, 2023 •

edited

Loading