Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Load quantized model from onnx format. #3226

Open
SylvainVerdy opened this issue Apr 28, 2023 · 2 comments
Open

[Question]: Load quantized model from onnx format. #3226

SylvainVerdy opened this issue Apr 28, 2023 · 2 comments
Labels
question Further information is requested wontfix This will not be worked on

Comments

@SylvainVerdy
Copy link

Question

Hi,

I have severall questions concerning onnx models and quantization. I tried to export to onnx my models. I succed into save it as an onnx format.

model = SequenceTagger.load("./exps/camembert-large/models/NER/Flair/taggers/sota-ner-flair/best-model.pt").cpu()
model.embeddings = model.embeddings.export_onnx("flert-embeddings.onnx", sentences, providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
quantize = True
onnx_optimize = True
if quantize:
    model.embeddings.quantize_model(
        "flert-quantized-embeddings.onnx", extra_options={"DisableShapeInference": True}
    )

if onnx_optimize:
    model.embeddings.optimize_model(
        "flert-optimized-embeddings.onnx", opt_level=2, use_gpu=False, only_onnxruntime=True
    )

First question,
Is it normal to find Ignore MatMul due to non constant B : /[/model/encoder/layer1../attention/self/MatMul..], when i tryed to quantize my model?
Now,
I'm trying to load my model, in SequenceTagger.load().
Do I need to save at the end of the code above my model in .pt to load into SequenceTagger to use TransformerOnnxWordEmbeddings class?

Do you have any example of loading onnx files in Inference to evaluate a corpus or several sentences?

Thanks a lot for your work!

@SylvainVerdy SylvainVerdy added the question Further information is requested label Apr 28, 2023
@helpmefindaname
Copy link
Member

helpmefindaname commented May 3, 2023

Hi @SylvainVerdy

Is it normal to find Ignore MatMul due to non constant B : /[/model/encoder/layer1../attention/self/MatMul..], when i tryed to quantize my model?

Yes, that warning is normal. I cannot tell you why exactly that happens or what it means, but it happens on all hugging face models that I tested so far.

Do I need to save at the end of the code above my model in .pt to load into SequenceTagger to use TransformerOnnxWordEmbeddings class?

Yes, as stated in the tutorial, you need to save the model again and keep using the new model.

Do you have any example of loading onnx files in Inference to evaluate a corpus or several sentences?

There are no examples specific to models that contain onnx-embeddings, as you still need to use the model the same way as before.

@stale
Copy link

stale bot commented Sep 17, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Sep 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants