MIGraphX accuracy_checker problem when run the accuracy for different model(bert, gpt2) #2181

stefankoncarevic · 2023-09-13T16:52:46Z

Description

When I run the accuracy_checker with this line without MLIR:
python accuracy_checker.py --fill1 --onnx /pathtomodel/bert_large_uncased_1_fp16_gpu.onnx

I get the message:
Outputs do not match
FAILED: MIGraphX is not within tolerance.

This is the output for this run line:
python accuracy_checker.py --fill1 --onnx /pathtomodel/bert_large_uncased_1_fp16_gpu.onnx --verbose

Output 0 is incorrect
Expected value:
[-0.90534556 -0.8867094 -0.05258729 ... 0.5805738 0.7309614
-0.9049778 ]
....
Actual value:
[-0.9003906 -0.8847656 0.0069809 ... 0.55908203 0.7246094
-0.9038086 ]

I add argument parser and migraphx.quantize_fp16 into python script and after that I think it's much better value but also failed.

This is the run line:
python accuracy_checker.py --fill1 --onnx /pathtomodel/bert_large_uncased_1_fp16_gpu.onnx --fp16 --verbose

This is the output:

Output 0 is incorrect ...
Expected value:
[-0.8792089 -0.8541642 0.2681412 ... 0.3513783 0.71706283
-0.9114238 ]
......
Actual value:
[-0.8730469 -0.85009766 0.28051758 ... 0.34179688 0.70458984
-0.90966797]

Tolerance in both case is 1e-3

I run also the bert_base_cased_1_fp16_gpu.onnx, distilgpt2_1_fp16_gpu.onnx and have the same messages.
Also try with enable MLIR but the messages is same.

stefankoncarevic · 2023-09-13T16:55:06Z

@jerryyin

CharlieL7 · 2023-10-05T19:34:46Z

For the bert models the accuracy is fixed with the --disable-fast-math option on the driver. I'm making an update to the accuracy checker to add the option to supply this to the MIGraphX runner. The tolerance also needs to be increased from 1e-3 to around 8e-2 because the machine epsilon of fp16 is about 1e-3.

CharlieL7 · 2023-10-05T20:12:14Z

The test will pass with #2298 and the proper python call.
For example:

python3 accuracy_checker.py --onnx /codes/onnx_models/bert_models/bert_base_cased_1_fp16_gpu.onnx --fill1 --input-dim input_ids:2,128 --disable-fast-math --tolerance 8e-2

jerryyin mentioned this issue Sep 13, 2023

When run verification of resnet50 model with fp16 quantization option, the verification fails #2156

Closed

CharlieL7 self-assigned this Sep 22, 2023

CharlieL7 linked a pull request Oct 13, 2023 that will close this issue

fp16 change default tolerances for driver verify #2310

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MIGraphX accuracy_checker problem when run the accuracy for different model(bert, gpt2) #2181

MIGraphX accuracy_checker problem when run the accuracy for different model(bert, gpt2) #2181

stefankoncarevic commented Sep 13, 2023

stefankoncarevic commented Sep 13, 2023

CharlieL7 commented Oct 5, 2023

CharlieL7 commented Oct 5, 2023

MIGraphX accuracy_checker problem when run the accuracy for different model(bert, gpt2) #2181

MIGraphX accuracy_checker problem when run the accuracy for different model(bert, gpt2) #2181

Comments

stefankoncarevic commented Sep 13, 2023

Description

stefankoncarevic commented Sep 13, 2023

CharlieL7 commented Oct 5, 2023

CharlieL7 commented Oct 5, 2023