Big endian issue: Graph Transformation Attention Fusion tests are failing #12921

tvkai · 2022-09-10T17:13:20Z

Describe the issue

Basically I am observing on AIX which is a big endian platform, following tests are failing, when onnxruntime is built from source .

1: [ FAILED ] GraphTransformationTests.AttentionFusionInt32Test
1: [ FAILED ] GraphTransformationTests.AttentionFusionInt64Test
1: [ FAILED ] GraphTransformationTests.AttentionFusionFloat32Test

The underlying issue seems to be onnxruntime is adding initializers to the graph during tranfsormation of the graph.
When initializers are getting added to the graph. the initializers are containing the raw_data, according to the
current implementation the graph should have little endian data and whenever it processes the data
UnpackTensors takes care of endianess of the machine. The issue is on a Big endian platform the data
is already in the BE and this makes graph to contain BE data instead of LE.

Below is the place where seems to be the actual issue.

In the file : onnxruntime/core/optimizer/attention_fusion.cc
Function: MergeQkvWeights

if (data_type == ONNX_NAMESPACE::TensorProto_DataType_FLOAT) {
const float* q_weight = q_initializer.data();
const float* k_weight = k_initializer.data();
const float* v_weight = v_initializer.data();
std::vector result;
result.reserve(gsl::narrow<size_t>(element_count));
if (is_matmul) {
MergeMatMulWeights(q_weight, k_weight, v_weight, result, hidden_size);
} else {
MergeWeights(q_weight, k_weight, v_weight, result, hidden_size);
}
initializer.set_raw_data(result.data(), gsl::narrow<size_t>(element_count) * sizeof(float)); <=== Here we are setting the raw_data using BE data when run on a Big endian platform.

Also in else block the same issue, initializer is getting formed with BE data and this is getting added to Graph.

One way to fix this issue is ..we need do byte swapping before we set the raw_data for Big Endian platforms.
I tested the byteswapping code before we actually set the raw data and tests are working fine.

To reproduce

Run graphtransformation on a Big Endian platform like AIX on IBM Power.

Urgency

No response

Platform

Other / Unknown

OS Version

AIX

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

latest

ONNX Runtime API

C

Architecture

IBM Power

Execution Provider

Default CPU

Execution Provider Library Version

No response

madurais · 2022-09-12T07:53:46Z

The raw data initilaizers which are read from onnx or ort files are in LE format. They should be modified to BE on BE systems. But the initializers which are input to any node, are expected to be in LE before running inference in RunInferencing() in graph.cc as onnx code expects the data in LE.

In above code it is Fusion test and hence the initializer raw data has to be converted to be in BE, either before merge, or after merge.

tvkai · 2022-09-12T08:59:15Z

@snnn @tianleiwu @skottmckay @edgchen1 can you let us know your comments on this issue.

tianleiwu · 2022-09-12T17:29:14Z

@tvkai. Thanks for reporting the issue and identifying the root cause. Since you have a fix, could you please submit a pull request?

@snnn, could we add a build pipeline on Big Endian platform? We can run occasionally (like before each release) to avoid similar issue.

tvkai · 2022-09-14T17:01:09Z

@tianleiwu, @snnn thanks a lot for giving your view on this issue, actually when we are testing on AIX, we are observing
multiple other places in the code which are adding initializers to the graph with raw_data. We may need to fix in all those places where the byte swapping is needed.

Basically following places where the initializers are getting added :

onnxruntime/core/optimizer/attention_fusion.cc

initializer is getting added to the graph in the function MergeQkvWeights()
onnxruntime/core/optimizer/constant_folding.cc

shape_constant initializer is getting added to the graph in the function ConstantFoldShapeNode().

onnxruntime/core/optimizer/embed_layer_norm_fusion.cc

initializer is getting added to the graph in the function ExtractEmbedding()

onnxruntime/core/optimizer/transpose_optimizer/transpose_optimizer.c

Initializer is getting added to the graph .

onnxruntime/core/optimizer/transpose_optimizer/optimizer_api_impl.cc

ApiGraph::AddInitializer is getting added to the graph

Above are the actual core runtime areas, similar to this there are some places the in the test folder as well.
We may need a common way to convert and add the initializers to the graph.

To avoid future misses we might need a better way to handle the big endian platform support.

One approach to handle this is probably for big endian systems convert the Graph’s raw_data to Big Endian
values when we load them from a model file and existing UnpackTensors can probably skip the conversion
( skip ReadLittleEndian()) But one issue with this approach is RunInferencing()
( schema->GetTypeAndShapeInferenceFunction()(*this) ) code that invokes the ONNX name space routines
which always expects raw data to be in Little Endian Format. However the advantage with this approach in
onnxruntime is less number of changes all over the code and in future we may not miss any of such above
places.

tvkai · 2022-09-15T16:13:48Z

@snnn @tianleiwu @skottmckay @edgchen1 any suggestions here ?.

madurais · 2022-09-16T12:48:02Z

We tried the above approach mentioned by @tvkai ie) convert the Graph’s raw_data to Big Endian
values when we load them from a model file and existing UnpackTensors can probably skip the conversion
( skip ReadLittleEndian()). We also changed the code to convert the input initializers for a node back to Little Endian before calling RunInferencing(), so that Input initializers are in Little Endian as expected by onnx code.

We could see all onnxruntime_test_all test suits are passing.

skottmckay · 2022-09-19T08:55:29Z

@madurais are you able to create a PR with the changes so they can be reviewed?

madurais · 2022-09-20T15:46:19Z

Yes Skott. I tried the changes. It works. Making some more changes to avoid some errors. Will create a PR once done by next week.

tvkai · 2024-05-15T10:57:12Z

@skottmckay @tianleiwu @snnn @edgchen1

We tried both approaches on latest main.

Whenever we are setting raw_data in the graph, convert the data to Big endian on Big endian platform. We identified all the potential places ( along with core, test code as well ) for this change. We see all tests ( c & c++ ) are passing.
The other approach of converting the Graph's raw_data to Big endian values when we load them from a model file and existing unpack tensors can skip the conversion. But we are observing one test
( AttentionTest.Attention_Mask1D_Fp32_B2_S64 ) failure.

Any suggestions on which approach we should be focusing on ?.

ranjitshs · 2024-05-15T11:04:42Z

we can raise PR for both approaches for review.

tianleiwu · 2024-05-15T18:14:35Z

@skottmckay @tianleiwu @snnn @edgchen1

We tried both approaches on latest main.

Whenever we are setting raw_data in the graph, convert the data to Big endian on Big endian platform. We identified all the potential places ( along with core, test code as well ) for this change. We see all tests ( c & c++ ) are passing.

The other approach of converting the Graph's raw_data to Big endian values when we load them from a model file and existing unpack tensors can skip the conversion. But we are observing one test
( AttentionTest.Attention_Mask1D_Fp32_B2_S64 ) failure.

Any suggestions on which approach we should be focusing on ?.

#2 looks good to me.

For these two approaches, I suggest to run a manual test to save the optimized model to disk using session options, and load it again using ORT to see whether the optimized model is still working.

Regarding to AttentionTest.Attention_Mask1D_Fp32_B2_S64, you can adjust tolerance threshold since it uses random inputs (so sometime the difference might exceed threshold). It might not be related to Big endian (otherwise, similar tests like Attention_Mask2D_Fp32_B2_S32 and Attention_NoMask_Fp16 will also fail).

ranjitshs · 2024-05-16T14:00:51Z

@tianleiwu
Thanks for the reply and suggestion.

I am not able to find C/C++ API using which I can save the model to disk . Could you please guide and tell us the reference for API usage ?
Regarding AttentionTest.Attention_Mask1D_Fp32_B2_S64, I have modified the tolerance threshold from 0.002 to 0.005. Using this value also, it's failing.

With 0.002f, 27834 entries of were mismatched , but
with 0.005f , 27337 entries are mismatching.
Total entires in output array is 98304.

tianleiwu · 2024-05-20T16:25:50Z

I am not able to find C/C++ API using which I can save the model to disk . Could you please guide and tell us the reference for API usage ?

Regarding AttentionTest.Attention_Mask1D_Fp32_B2_S64, I have modified the tolerance threshold from 0.002 to 0.005. Using this value also, it's failing.

@ranjitshs,

An example of C++ API (there is another example for C API in the page):
https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html#c-api-example-2

Could you try change one line of the test code of Attention_Mask1D_Fp32_B2_S64 from

mask_index_data.push_back(i == 0 ? sequence_length : (sequence_length / 2));

to

mask_index_data.push_back(sequence_length);

That could make the result more deterministic.

ranjitshs · 2024-05-22T09:09:36Z

@tianleiwu
Thanks for link and code change suggestion

I am able to save the optimized model using SetOptimizedModelFilePath() and reload it properly.
The Attention_Mask1D_Fp32_B2_S64 test is also passing now with the tolerance threshold value 0.006 , along with the above suggested change 'mask_index_data.push_back(sequence_length)'. So to make it pass in AIX , I will put these two change in #ifdef flag.

We are trying to evaluate performance of the two approaches .

ranjitshs · 2024-06-07T11:59:23Z

@skottmckay @tianleiwu @snnn @edgchen1
For enabling AIX and 2nd approach of this issue, there are around 28 changed files with 413 additions and 53 deletions.
As per PR guide-line, changes are big , So could you suggest the better way to raise PR
or is it fine to raise these changes in single PR.

tianleiwu · 2024-06-07T18:28:38Z

@ranjitshs, please use one PR for the selected approach (you can choose one based on your judgement), optionally another PR for the other approach if you do not know which one is better. It's fine for big changes. Thanks.

ranjitshs · 2024-06-21T06:28:36Z

@skottmckay @tianleiwu @snnn @edgchen1
Based on our (@tvkai , @ranjitshs ) understanding , we are going ahead with approach 1. We have designed a wrapper utility API SetRawDataInTensorProto for handling big-endian issue and replaced with this wrapper function wherever onnxrunime is setting the raw data in tensor proto. Changes are in #21133 along with enablement for AIX.
Please have a look and suggest if any better approach for our changes.

ranjitshs · 2024-07-02T06:53:08Z

@skottmckay @tianleiwu @snnn @edgchen1
Hi All,
May I know the next step/process on this issue.How we will take if forward further. Please let me know.

snnn · 2024-07-02T22:02:05Z

@yufenglee, the PR needs your review.

tianleiwu · 2024-07-02T22:50:48Z

The main challenge is to add an Azure CI pipeline that could test big endian. Otherwise, the code is not validated, and we cannot prevent regression in the future.

One possible way to run docker like https://til.simonwillison.net/docker/emulate-s390x-with-qemu. That might need some additional change to enable such test pipeline.

ranjitshs · 2024-07-05T11:56:35Z

@tianleiwu
is it possible to integrate AIX OS in the Azure CI pipeline ?

tianleiwu · 2024-07-09T17:56:49Z

@ranjitshs, Since we do no have AIX in CI pipeline, it has to use QEMu. I suggest to try some docker like https://github.com/WilliamHeaven/docker-aix to see whether it could work. @mszhanyi, could you help setup a pipeline?

ranjitshs · 2024-07-10T09:28:39Z

@tianleiwu
So if we provide a dedicated AIX System for CI activity, is it possible to integrate this system under CI pipeline ?

Also, may I know when is next release ?

tianleiwu · 2024-07-11T23:55:52Z

@ranjitshs, I have no experience to setup such CI pipeline, and there is little information in web for github CI using AIX.

Even without integrating to CI pipeline, it is easy to schedule daily build by yourself to make sure the main branch is good for AIX build. That could make sure build issue is found and resolved quickly.

The next release 1.19 is tentatively feature complete by July 19, release branch freeze by end of July, and release at mid of Aug.

ranjitshs · 2024-07-15T12:03:14Z

@tianleiwu
Yes. We do have local CI setup which is configured to run the main branch along with AIX patches .

Can we fast track the review process ? Generally, how much time will it take to complete the review process

…dian platform. (#21133) ### Description Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. ### Motivation and Context changes in this PR contains: 1. Enablement code for building onnxruntime on AIX operating system. 2. while testing the build on AIX, we found issues related to big endian platform . More details about few of those issues can be found in [Big endian issue: Graph Transformation Attention Fusion tests are failing #12921](#12921) Below are list of files and the description about the change. 1. cmake/CMakeLists.txt [BUILDING on AIX issue] check for "IBMClang" is added for handling -Wno-unused-parameter 2. cmake/external/onnxruntime_external_deps.cmake [BUILDING on AIX issue]Enabling gtest_disable_pthreads for AIX 3. cmake/onnxruntime.cmake [BUILDING on AIX issue] o Blocking codes for AIX which generates generated_source.c and further requires some symbol files. o Putting NO AIX check for non-supported linker flags like --Xlinker o iconv linking 4. cmake/onnxruntime_framework.cmake [BUILDING on AIX issue]Putting NO AIX check for -Wl,-rpath='$ORIGIN' 5. cmake/onnxruntime_mlas.cmake [BUILDING on AIX issue]POWER10 releated macro/function definition . 6. cmake/onnxruntime_providers_cpu.cmake [BUILDING on AIX issue]Putting NO AIX check for non-supported linker flags like --Xlinker 7. cmake/onnxruntime_unittests.cmake [BUILDING on AIX issue] o Putting NO AIX check for non-supported linker flags like --Xlinker o Adding required libraries for AIX linker under applicatiion like onnxruntime_shared_lib_test ,onnxruntime_logging_apis etc 8. cmake/patches/flatbuffers/flatbuffers.patch [BUILDING on AIX issue] Handling of TypeCode in include/flatbuffers/flatbuffers.h under AIX + clang 9. onnxruntime/contrib_ops/cpu/murmur_hash3.cc [Big endian issue] Byte-Conversion handlling in compute() and getblock() routines 10. onnxruntime/contrib_ops/cpu/quantization/matmul_nbits_impl.cc [Big endian issue] Handling of test failures . Byte swapping for quant_value. 11. onnxruntime/core/framework/tensorprotoutils.cc [Big endian issue] Implementation of SetRawDataInTensorProto , ConvertRawDataInTensorProto . o SetRawDataInTensorProto : Wrapper for set_raw_data(). Calling ConvertRawDataInTensorProto() in big-endian system o ConvertRawDataInTensorProto : function used mainly on big-endian system for byte-swapping of tensor raw_data 12. onnxruntime/core/framework/tensorprotoutils.h [Big endian issue] Declaration of SetRawDataInTensorProto, ConvertRawDataInTensorProto 13. onnxruntime/core/graph/graph.cc [Big endian issue] o Call ConvertRawDataInTensorProto for SPARSE_TENSOR type o Call ConvertRawDataInTensorProto for SaveToOrtFormat 14. onnxruntime/core/mlas/lib/platform.cpp [BUILDING on AIX issue] POWER10 released enablement for AIX 15. onnxruntime/core/mlas/lib/power/qgemm_kernel_power10.cpp [BUILDING on AIX issue]Handling of __vector under AIX+clang 16. onnxruntime/core/mlas/lib/qgemm.h [BUILDING on AIX issue] Adding _AIX flag 17. onnxruntime/core/mlas/lib/qlmul.cpp [BUILDING on AIX issue] Handling of __vector under AIX+clang 18. onnxruntime/core/optimizer/attention_fusion.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 19. onnxruntime/core/optimizer/compute_optimizer/shared_utils.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 20. onnxruntime/core/optimizer/constant_folding.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 21. onnxruntime/core/optimizer/embed_layer_norm_fusion.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 22. onnxruntime/core/optimizer/nchwc_transformer.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 23. onnxruntime/core/optimizer/qdq_transformer/avx2_weight_s8_to_u8.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 24. onnxruntime/core/optimizer/qdq_transformer/qdq_s8_to_u8.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 25. onnxruntime/core/optimizer/qdq_transformer/s8_to_u8.h [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 26. onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_actions.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 27. onnxruntime/core/optimizer/reshape_fusion.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 28. onnxruntime/core/optimizer/stft_decomposition.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 29. onnxruntime/core/optimizer/transpose_optimization/ort_optimizer_api_impl.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 30. onnxruntime/core/platform/path_lib.h [BUILDING on AIX issue] Moving to normal function call, instead of template 31. onnxruntime/core/platform/posix/env.cc [BUILDING on AIX issue]Blocking syscall.h in AIX 32. onnxruntime/core/session/inference_session.cc [Big endian issue] Removing ORT_RETURN_IF_NOT, FLATBUFFERS_LITTLEENDIAN 33. onnxruntime/test/flatbuffers/flatbuffer_utils_test.cc [Big endian issue] Call ConvertRawDataInTensorProto in CreateInitializer and ExternalWriteReadWithLoadInitializers 34. onnxruntime/test/framework/sparse_kernels_test.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 35. onnxruntime/test/framework/tensorutils_test.cc [Big endian issue] Helper method ConvertEndianessForVector and call this from required place. 36. onnxruntime/test/framework/test_tensor_loader.cc o. [BUILDING on AIX issue] Handling of getcwd for AIX o. [Big endian issue] Bytes Swapping in run_external_data_test 37. onnxruntime/test/onnx/main.cc [Big endian issue] including <thread> for AIX 38. onnxruntime/test/onnx/tensorprotoutils.cc [Big endian issue] Bytes swapping in UnpackTensorWithRawData 39. onnxruntime/test/optimizer/graph_transform_test.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 40. onnxruntime/test/optimizer/graph_transform_test_builder.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 41. onnxruntime/test/optimizer/graph_transform_test_builder.h [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 42. onnxruntime/test/optimizer/initializer_test.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 43. onnxruntime/test/optimizer/nchwc_optimizer_test.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 44. onnxruntime/test/providers/base_tester.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 45. onnxruntime/test/providers/cpu/generator/random_test.cc [BUILDING on AIX issue] Adding AIX check in MultinomialGoodCase --------- Co-authored-by: Vamshikrishna Thatikonda <[email protected]>

ranjitshs · 2024-07-18T06:49:28Z

Thanks @tianleiwu @snnn @liqunfu @yihonglyu @skottmckay @edgchen1 for providing suggestion/help on this issue/PR.
We will monitor local CI setup and create issue if we see any build/tests failure in main branch.

tvkai · 2024-07-18T06:57:08Z

Thank you @tianleiwu @snnn @liqunfu @yihonglyu @skottmckay @edgchen1 for all the help.

ranjitshs · 2024-08-23T17:12:49Z

@tianleiwu
Hi ,
I see branch 1.19.0 branch is forked. Any idea about the release date ?

tianleiwu · 2024-08-23T17:51:32Z

@ranjitshs, 1.19 was released: https://github.com/microsoft/onnxruntime/releases/tag/v1.19.0

ranjitshs · 2024-08-23T17:55:27Z

@tianleiwu
Thanks for info. I just noticed.

snnn added the model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. label May 16, 2024

ranjitshs mentioned this issue Jun 21, 2024

Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. #21133

Merged

tvkai closed this as completed Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big endian issue: Graph Transformation Attention Fusion tests are failing #12921

Big endian issue: Graph Transformation Attention Fusion tests are failing #12921

tvkai commented Sep 10, 2022

madurais commented Sep 12, 2022

tvkai commented Sep 12, 2022

tianleiwu commented Sep 12, 2022 •

edited

Loading

tvkai commented Sep 14, 2022

tvkai commented Sep 15, 2022

madurais commented Sep 16, 2022

skottmckay commented Sep 19, 2022

madurais commented Sep 20, 2022

tvkai commented May 15, 2024

ranjitshs commented May 15, 2024

tianleiwu commented May 15, 2024 •

edited

Loading

ranjitshs commented May 16, 2024

tianleiwu commented May 20, 2024

ranjitshs commented May 22, 2024

ranjitshs commented Jun 7, 2024

tianleiwu commented Jun 7, 2024 •

edited

Loading

ranjitshs commented Jun 21, 2024

ranjitshs commented Jul 2, 2024

snnn commented Jul 2, 2024

tianleiwu commented Jul 2, 2024

ranjitshs commented Jul 5, 2024

tianleiwu commented Jul 9, 2024

ranjitshs commented Jul 10, 2024

tianleiwu commented Jul 11, 2024

ranjitshs commented Jul 15, 2024

ranjitshs commented Jul 18, 2024

tvkai commented Jul 18, 2024

ranjitshs commented Aug 23, 2024

tianleiwu commented Aug 23, 2024

ranjitshs commented Aug 23, 2024

Big endian issue: Graph Transformation Attention Fusion tests are failing #12921

Big endian issue: Graph Transformation Attention Fusion tests are failing #12921

Comments

tvkai commented Sep 10, 2022

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

madurais commented Sep 12, 2022

tvkai commented Sep 12, 2022

tianleiwu commented Sep 12, 2022 • edited Loading

tvkai commented Sep 14, 2022

tvkai commented Sep 15, 2022

madurais commented Sep 16, 2022

skottmckay commented Sep 19, 2022

madurais commented Sep 20, 2022

tvkai commented May 15, 2024

ranjitshs commented May 15, 2024

tianleiwu commented May 15, 2024 • edited Loading

ranjitshs commented May 16, 2024

tianleiwu commented May 20, 2024

ranjitshs commented May 22, 2024

ranjitshs commented Jun 7, 2024

tianleiwu commented Jun 7, 2024 • edited Loading

ranjitshs commented Jun 21, 2024

ranjitshs commented Jul 2, 2024

snnn commented Jul 2, 2024

tianleiwu commented Jul 2, 2024

ranjitshs commented Jul 5, 2024

tianleiwu commented Jul 9, 2024

ranjitshs commented Jul 10, 2024

tianleiwu commented Jul 11, 2024

ranjitshs commented Jul 15, 2024

ranjitshs commented Jul 18, 2024

tvkai commented Jul 18, 2024

ranjitshs commented Aug 23, 2024

tianleiwu commented Aug 23, 2024

ranjitshs commented Aug 23, 2024

tianleiwu commented Sep 12, 2022 •

edited

Loading

tianleiwu commented May 15, 2024 •

edited

Loading

tianleiwu commented Jun 7, 2024 •

edited

Loading