Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIGraphX accuracy_checker problem when run the accuracy for different model(bert, gpt2) #2181

Open
stefankoncarevic opened this issue Sep 13, 2023 · 3 comments · Fixed by #2310
Assignees

Comments

@stefankoncarevic
Copy link

Description

When I run the accuracy_checker with this line without MLIR:
python accuracy_checker.py --fill1 --onnx /pathtomodel/bert_large_uncased_1_fp16_gpu.onnx

I get the message:
Outputs do not match
FAILED: MIGraphX is not within tolerance.

This is the output for this run line:
python accuracy_checker.py --fill1 --onnx /pathtomodel/bert_large_uncased_1_fp16_gpu.onnx --verbose

Output 0 is incorrect
Expected value:
[-0.90534556 -0.8867094 -0.05258729 ... 0.5805738 0.7309614
-0.9049778 ]
....
Actual value:
[-0.9003906 -0.8847656 0.0069809 ... 0.55908203 0.7246094
-0.9038086 ]

I add argument parser and migraphx.quantize_fp16 into python script and after that I think it's much better value but also failed.

This is the run line:
python accuracy_checker.py --fill1 --onnx /pathtomodel/bert_large_uncased_1_fp16_gpu.onnx --fp16 --verbose

This is the output:

Output 0 is incorrect ...
Expected value:
[-0.8792089 -0.8541642 0.2681412 ... 0.3513783 0.71706283
-0.9114238 ]
......
Actual value:
[-0.8730469 -0.85009766 0.28051758 ... 0.34179688 0.70458984
-0.90966797]

Tolerance in both case is 1e-3

I run also the bert_base_cased_1_fp16_gpu.onnx, distilgpt2_1_fp16_gpu.onnx and have the same messages.
Also try with enable MLIR but the messages is same.

@stefankoncarevic
Copy link
Author

@jerryyin

@CharlieL7
Copy link
Collaborator

For the bert models the accuracy is fixed with the --disable-fast-math option on the driver. I'm making an update to the accuracy checker to add the option to supply this to the MIGraphX runner. The tolerance also needs to be increased from 1e-3 to around 8e-2 because the machine epsilon of fp16 is about 1e-3.

@CharlieL7
Copy link
Collaborator

The test will pass with #2298 and the proper python call.
For example:

python3 accuracy_checker.py --onnx /codes/onnx_models/bert_models/bert_base_cased_1_fp16_gpu.onnx --fill1 --input-dim input_ids:2,128 --disable-fast-math --tolerance 8e-2

@CharlieL7 CharlieL7 linked a pull request Oct 13, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants