-
-
Notifications
You must be signed in to change notification settings - Fork 16.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
numerical differences after converting to coreml #571
Comments
Hello @wmpauli, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com. |
I guess, this is not that unusual: https://developer.apple.com/forums/thread/82147 |
@wmpauli seems a bit higher than I'd expect. Results will depend on your quantization also. |
thanks @glenn-jocher. I didn't do any quantization. |
@wmpauli I have been facing the same issue, but in my case the error percent is lot higher. I am not sure where the error is. I want to double check my inference code, so can you please post your code which runs inference on the *.mlmodel file. |
@abhimanyu8713 , below is the code I use for evaluation. Hopefully it is useful. You will probably have to make some changes to the constants at the top of the script. I'm still not sure why the results are so different after conversion. I'm suspecting that it is something about image normalization, or some other transform that happens either in coreml or in pytorch.
|
Noticed this is the case for me as well. Has any of you got to any solution at all? I know there are compatibility issues between PyTorch/TensorFlow upsample like operations that lead to differences at times (some info here), but as per my checks, yolov5 should be ok (uses @glenn-jocher : did you benchmark your YOLOv5 exported model against the results you get with the trained checkpoints using the |
@dlawrences we've benchmarked FP16/32 changes as negligible during the FP16 update, other than that no, we do not test exported models using test.py. When quantizing in CoreML you can clearly see progressively worse deterioration in the anecdotal results in iDetection at higher quantization levels, using both kmeans and linear methods. |
@glenn-jocher thanks, Glenn. for clarity, are you still relying on PyTorch > ONNX > CoreML conversion path, right? Overall, I think it is probably related to this bit: apple/coremltools#831 I will dig through the Cheers |
@dlawrences yes, but this is a coreml step, so may or may not depend on the route the model took to get there. |
@dlawrences , it is my understanding that the conversion is actually PyTorch > traced Pytorch > CoreML, i.e. w/o onnx. Also, I don't get the error msg mentioned in apple/coremltools#831. |
Hi @wmpauli thanks for sharing the code. My code is almost similar to yours. The only thing which I have not included is
I didn't get why we need to reverse the prediction array. Can you shed some light on this? Also for the CoreML model did you use |
Updates: I now get way better results that before on device using YOLOv5s just by upgrading to I have not benchmarked just yet the same footage against the |
@dlawrences I wonder if we should add the export dependencies (onnx, coremltools==4.0b2) to requirements.txt. I haven't so far because I suspect the vast majority of users don't need them. The way I handled this for Lines 1 to 14 in 66744a0
Other repos, like pytorch lightning have a requirements folder with different requirements.txt files added by use case, so that's another option (i.e. requirements/base.txt and requirements/export.txt). |
@glenn-jocher I think you should use |
@dlawrences ok! I've updated requirements.txt now with different sections, with only the base section uncommented. For export I have this. Lines 19 to 22 in 7b2b521
I have a feeling I should separate this into it's own requirements/export.txt file, to allow simple export-related pip installs, but I'd like to minimize adding directories and files as much as possible. torch 1.6 is not compatible with coremltools 4.0b2, and onnx 1.7 has it's own issue with unsupported hardswish layers. I've raised a hardswish issue on onnx/onnx#2728 (comment). v2.0 models export correctly via both onnx and coremltools using torch 1.5.1 however, so I believe the best workflow would be to train LeakyReLU() models if they are going to require export in the short term, and then to export in a torch 1.5.1 environment. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@wmpauli I used your code to eval a coreml model but the bounding boxes are placed wrong. I noticed that you have a function plot_one_box, can you also share that code so I can double check with mine? Edit: fixed it. The problem was with the strides order. In my model the order had to be stride = [8, 32, 16]. Be sure to check anchors as well |
Hi @glenn-jocher. My neural network has 29mb instead of 7.7mb, I assume that yours it's the quantized version. Also my type is Neural Network, yours is Neural Network -> Non Maximum Suppression. Are there any additional steps to make when exporting or quantizing the network in order to add nms to my .mlmodel Thanks |
@OctaM Hi. Did you figure out that NMS part? |
@glenn-jocher Can we get more info on the NMS part in exported model? I also opened a new issue #5157 since my exported mdoel is not working well! |
How big of a numerical difference in output would be acceptable, between pytorch output and coreml output.
Additional context
Thank you for sharing your code. I have trained my own model and it works well. I converted it to coreml and see some differences in behavior. In coreml version, i have to dial down the confidence threshold to get the same results. Though I doubt that this will generally a reliable solution.
I tried to debug this a bit. In pytorch, if I set a breakpoint in line 38 of
models/yolo.py
, I get this:but when running the coreml model on the same image, I get:
Is this in the realm of what one would expect? (i'm new to coreml). Do I need to make changes to scale or bias during conversion? I played around with that a bit, but the settings in the export script seem to be the best.
Is it generally expected that one has to play around with conf_thres and iou_thres after conversion?
The text was updated successfully, but these errors were encountered: