Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When run verification of resnet50 model with fp16 quantization option, the verification fails #2156

Closed
stefankoncarevic opened this issue Sep 6, 2023 · 21 comments · Fixed by #2182
Assignees

Comments

@stefankoncarevic
Copy link

stefankoncarevic commented Sep 6, 2023

Description

After quantization fp16 for ResNet50.onnx model, I got the error about mismatch:

./bin/migraphx-driver verify --gpu --onnx /MIGraphXDeps/resnet50-v1-7.onnx --fp16

...
FAILED: /MIGraphXDeps/resnet50-v1-7.onnx
error: 0.00123406
Max diff: 0.0360341
Mismatch at 0: -0.624751 != -0.62793

fp16-ResNet50-migraphx.txt

I disabled mlir before that.
This blocks adding our CI tracking fp16 and int8 language onnx (quantization) models.

@stefankoncarevic
Copy link
Author

@jerryyin

@jerryyin
Copy link
Member

jerryyin commented Sep 6, 2023

Were you able to run into this issue when using onnx runtime to verify as well?

In today's meeting discussion, we believe this is the issue in particular related with migraphx's CPU verifier. It'd be desirable if you can get the additional data point on onnx runtime verification.

@umangyadav
Copy link
Member

It'd be desirable if you can get the additional data point on onnx runtime verification.

yes i've seen such failures on resnet50 FP16 verification. @stefankoncarevic if you can run with real data it should match up with accuracy required.

@jerryyin
Copy link
Member

@umangyadav Regardless of what data @stefankoncarevic used to run, the driver verify shouldn't fail. If the run pass with real data, then it is still a bug on migraphx driver because it didn't report the correct result.

The point is that there is patch work needed such that migraphx driver verification can use fp16 threshold even if the model produced a fp32 result.

@umangyadav
Copy link
Member

Agree that migraphx-driver requires some change in logic for threshold. I wanted to say that even though migraphx-driver shows failing accuracy, it shouldn't affect accuracy in real use cases and therefore shouldn't be a blocker.

@jerryyin
Copy link
Member

it shouldn't affect accuracy in real use cases and therefore shouldn't be a blocker.

Can you give us that guarantee? That we should ignore the migraphx driver verify result regardless of what it reports?

@umangyadav
Copy link
Member

umangyadav commented Sep 13, 2023

Can you give us that guarantee?

I am pretty sure for resnet50 (based on my experience with UIF/MLPerf developments), but can't say for sure for other models. This issue specifically mentioned resnet50.

If the driver verification is relied upon for other models or further development then, yes it would be better to solve it. (i've also seen migraphx-driver failing with bert/GPT but passing when using real data, but l am less confident on those ones).

@jerryyin
Copy link
Member

This issue specifically mentioned resnet50.

Resnet50 is just one example. We desperately need a mechanism to give us deterministic result to reflect the correctness of a the result of a run. whether this ticket or #2181 is a priority will be based on your and @causten's judgement. Pick either one to start with, and so we can have a way to routinely track as regression test.

If you already knew resnet50 is returning incorrect result, why wasn't this fixed before, or at least a ticket filed to track this? It make me concerned that those fixes require mlir team to drive it.

If the driver verification is relied upon for other models or further development

Nope, I believe we have run this on multiple models and get mostly failing result with or without mlir enabled. Now you should fix this issue so we can do a second round of regression checks.

@pfultz2
Copy link
Collaborator

pfultz2 commented Sep 13, 2023

We desperately need a mechanism to give us deterministic result to reflect the correctness of a the result of a run.

You should use the test data for the model with tools/test_runner.py to verify the results. It much faster and more accurate than using driver verify.

@jerryyin
Copy link
Member

jerryyin commented Sep 13, 2023

@pfultz2 Thanks for the pointer. How hard is it to be automated in our CI so it gets to check on nightly basis? We can transition to this if it is the only way to verify the correctness result.

@stefankoncarevic Could you file a ticket on mlir side to track this? Also, could you follow-up with @pfultz2 (Paul Fultz) on how to use test_runner.py instead?

@pfultz2
Copy link
Collaborator

pfultz2 commented Sep 13, 2023

How hard is it to be automated in our CI so it gets to check on nightly basis?

You just need the models and test data on the machine(usually these are packaged together). And then run python3 tools/test_runner.py <path-to-model-and-data-directory>.

@jerryyin
Copy link
Member

You just need the models and test data on the machine(usually these are packaged together).

Is that in a mount point anywhere, that the models and test data exist? Downloading it on the fly for CI machines would be inconvenient.

@pfultz2
Copy link
Collaborator

pfultz2 commented Sep 13, 2023

i've also seen migraphx-driver failing with bert/GPT but passing when using real data, but l am less confident on those ones.

So bert squad fails when using --fp16(-8.23 vs -7.7425637). Setting the tolerance to 1(instead of 1e-3) then it passes.

@pfultz2
Copy link
Collaborator

pfultz2 commented Sep 13, 2023

Is that in a mount point anywhere, that the models and test data exist?

There is no mount point. There is zip file somewhere(it was on rome6 and mi250 but those machines are not accesible or was wiped clean) with a bunch of models and you can download the test data from onnx model zoo.

Downloading it on the fly for CI machines would be inconvenient.

It can get pretty big so you definitely want to store it on the node you are using(or mount a drive somehow).

@jerryyin
Copy link
Member

@pfultz2 I'm going to ask @stefankoncarevic to follow-up with you on this. I think this is going to be slightly different conversation from this ticket because whether or not this can be used in mlir nightly CI isn't clear to me yet. And whether or not the test_runner mechanism will work is an orthogonal topic to this ticket and #2181.

@causten Any way we can have the unzipped (zip) file placed on the NAS mount point?

If not I'd like either this and #2181 to be worked as a priority as it is straightforward to use them in MLIR CI. Not having such regression test will yield significant risk to the deliverable to ROCm 6.0 release.

@stefankoncarevic
Copy link
Author

stefankoncarevic commented Sep 14, 2023

@pfultz2 Where can I get the real data for model?
This is important to me because I want to run the test_runner.py, and see the output for different model.

I change the tolerance in accuracy_checker on 1 and after that a lot of model now passing for fp16 and int8, but I didn't sure it's the right value because it's very large number when compare with 1e-3?
For migraphx-driver the tolerance is much smaller 4e-6 for fp32, double threshold = epsilon() * tolerance, tolerance by default is 80, epsilon I think depends on type fp32, fp16 and int8?

@jerryyin
Copy link
Member

@stefankoncarevic I don't think it is realistic for us to build a infrastructure for migraphx's model verification. That is out of scope for this release. Next release we can consider building something else that will isolate better for strictly mlir components only.

For #2182 completion purpose, please feel free to bump up the accuracy tolerance to get a full pass. Since verification doesn't work even without MLIR, this is out of our hands. We can reduce the tolerance once it get fixed in migraphx.

For this ticket, skip the models that has a verification failure until migraphx fixed it.

@jerryyin
Copy link
Member

@TedThemistokleous @pfultz2 Correct me if I'm wrong, but I don't think the test runner PR should be linked to this one. This one is about migraphx verifier, and is orthogonal to the PR?

Based on this understanding I'm re-opening it. Feel free to close again if I misunderstood.

@CharlieL7
Copy link
Collaborator

We need to update the tolerances for fp16 and other data type calculations. Currently we can get around this by using #2213 and adding the driver verify option --rms-tol 8e-2.

@umangyadav
Copy link
Member

@CharlieL7 does #2334 fix this issue ?

@CharlieL7
Copy link
Collaborator

Yes, should be fixed with #2334.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants