-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error running quantized onnx model #1543
Comments
@hossein1387 Can you share the fp32 onnx model and the quantization script input params which you chose while quantizing the model. |
Here is the original fp32 model, and here is the quantized version. I quantized the model using the following code:
|
I think there is a serious bug with the quantizer module. I trained an MLP model and passed the trained model through the quantizer and realized the quantizer did not do anything. After going through the code I found out that the quantizer only quantizes the Conv and MatMul nodes. I don't know why a GEMM operator (which can be found in MLP and FC layers) is not a MatMul. Anyway, I then tried to train a network with only one Conv layer and a FC layer. The following shows the graph of my network: On top is the fp32 version and the bottom graph shows the quantized version. The model passed through the quantizer successfully but again I was unable to run the model using the onnx runtime and got the same error as before. |
@hossein1387 : Thanks for the detailed info... I will update the quantization script to include GEMM as well and will update you once I root cause the shape inference bug in the quantized model. |
@hossein1387 : The reason for shape inference failure is an invalid default value in quantize script which is not supported by runtime yet... We do plan to add per-channel quantization support but it is not available today... |
Thanks for the update, I ran two models with and without quantization, both models are VGG like and both are using CIFAR100, here is some results:
I dont understand why the quantized version is taking much longer than original model? Shouldn't the 8 bit quantized model take less than the FP32 model? |
@askhade any idea on why I am gettig these results? |
@hossein1387 : which platform are you running on? We don't have optimized kernel support for windows yet... this work is in progress. On Linux the perf should be better than windows but we only support single threaded kernels... |
@hossein1387 could you provide more info on this if you still require assistance? |
Thanks for your responses @askhade @faxu.
When I check my quantized model, the onnx graph has many more operations compared to the original floating point model. I am not sure why that is and why cant we just use 8bit operations/operators, but as a results of this design choice, the 8bit model execution time is much more than the floating point model. It would be awsome if someone explain to me about these design choices. |
I have the same problem and same question. How can I get inference acceleration on onnxruntime? |
@hossein1387 : Regarding achieving acceleration with quantized models... As part ort 1.0 release we added optimized kernels for matmul operations however optimized kernel work for convolutions is still in progress... Once this is done then the model should experience significant speedup than today... |
@askhade Thank you very much for your response. |
I have the same issue, quantized model double lower and take more time |
now onnx quantized still very slow. Does this isue solved? Please reply me. Thanks a lot. @askhade |
Describe the bug
Using Quantization tool I quantized
VGG.onnx
and gotVGG_Quant.onnx
. However, when I try to run the quantized model I get:RuntimeError: [ONNXRuntimeError] : 1 : GENERAL ERROR : Load model from VGG_Quant.onnx failed:[ShapeInferenceError] Incompatible dimensions
Running the original onnx model (
VGG.onnx
) with the same setup (same dataset) does not produce any error. The error occur when I try to create anInferenceSession
, here is how I try to run my code:System information
The original model was downloaded from Pytorch zoo models and then converted to onnx (which again runs perfectly fine with onnxruntime).
The text was updated successfully, but these errors were encountered: