-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big endian issue: Graph Transformation Attention Fusion tests are failing #12921
Comments
The raw data initilaizers which are read from onnx or ort files are in LE format. They should be modified to BE on BE systems. But the initializers which are input to any node, are expected to be in LE before running inference in RunInferencing() in graph.cc as onnx code expects the data in LE. In above code it is Fusion test and hence the initializer raw data has to be converted to be in BE, either before merge, or after merge. |
@snnn @tianleiwu @skottmckay @edgchen1 can you let us know your comments on this issue. |
@tianleiwu, @snnn thanks a lot for giving your view on this issue, actually when we are testing on AIX, we are observing Basically following places where the initializers are getting added :
shape_constant initializer is getting added to the graph in the function ConstantFoldShapeNode().
initializer is getting added to the graph in the function ExtractEmbedding()
Initializer is getting added to the graph .
ApiGraph::AddInitializer is getting added to the graph Above are the actual core runtime areas, similar to this there are some places the in the test folder as well. To avoid future misses we might need a better way to handle the big endian platform support. One approach to handle this is probably for big endian systems convert the Graph’s raw_data to Big Endian |
@snnn @tianleiwu @skottmckay @edgchen1 any suggestions here ?. |
We tried the above approach mentioned by @tvkai ie) convert the Graph’s raw_data to Big Endian We could see all onnxruntime_test_all test suits are passing. |
@madurais are you able to create a PR with the changes so they can be reviewed? |
Yes Skott. I tried the changes. It works. Making some more changes to avoid some errors. Will create a PR once done by next week. |
@skottmckay @tianleiwu @snnn @edgchen1 We tried both approaches on latest main.
Any suggestions on which approach we should be focusing on ?. |
we can raise PR for both approaches for review. |
#2 looks good to me. For these two approaches, I suggest to run a manual test to save the optimized model to disk using session options, and load it again using ORT to see whether the optimized model is still working. Regarding to AttentionTest.Attention_Mask1D_Fp32_B2_S64, you can adjust tolerance threshold since it uses random inputs (so sometime the difference might exceed threshold). It might not be related to Big endian (otherwise, similar tests like Attention_Mask2D_Fp32_B2_S32 and Attention_NoMask_Fp16 will also fail). |
@tianleiwu
With 0.002f, 27834 entries of were mismatched , but |
An example of C++ API (there is another example for C API in the page):
to
That could make the result more deterministic. |
@tianleiwu
We are trying to evaluate performance of the two approaches . |
@skottmckay @tianleiwu @snnn @edgchen1 |
@ranjitshs, please use one PR for the selected approach (you can choose one based on your judgement), optionally another PR for the other approach if you do not know which one is better. It's fine for big changes. Thanks. |
@skottmckay @tianleiwu @snnn @edgchen1 |
@skottmckay @tianleiwu @snnn @edgchen1 |
@yufenglee, the PR needs your review. |
The main challenge is to add an Azure CI pipeline that could test big endian. Otherwise, the code is not validated, and we cannot prevent regression in the future. One possible way to run docker like https://til.simonwillison.net/docker/emulate-s390x-with-qemu. That might need some additional change to enable such test pipeline. |
@tianleiwu |
@ranjitshs, Since we do no have AIX in CI pipeline, it has to use QEMu. I suggest to try some docker like https://github.com/WilliamHeaven/docker-aix to see whether it could work. @mszhanyi, could you help setup a pipeline? |
@tianleiwu Also, may I know when is next release ? |
@ranjitshs, I have no experience to setup such CI pipeline, and there is little information in web for github CI using AIX. Even without integrating to CI pipeline, it is easy to schedule daily build by yourself to make sure the main branch is good for AIX build. That could make sure build issue is found and resolved quickly. The next release 1.19 is tentatively feature complete by July 19, release branch freeze by end of July, and release at mid of Aug. |
@tianleiwu Can we fast track the review process ? Generally, how much time will it take to complete the review process |
…dian platform. (#21133) ### Description Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. ### Motivation and Context changes in this PR contains: 1. Enablement code for building onnxruntime on AIX operating system. 2. while testing the build on AIX, we found issues related to big endian platform . More details about few of those issues can be found in [Big endian issue: Graph Transformation Attention Fusion tests are failing #12921](#12921) Below are list of files and the description about the change. 1. cmake/CMakeLists.txt [BUILDING on AIX issue] check for "IBMClang" is added for handling -Wno-unused-parameter 2. cmake/external/onnxruntime_external_deps.cmake [BUILDING on AIX issue]Enabling gtest_disable_pthreads for AIX 3. cmake/onnxruntime.cmake [BUILDING on AIX issue] o Blocking codes for AIX which generates generated_source.c and further requires some symbol files. o Putting NO AIX check for non-supported linker flags like --Xlinker o iconv linking 4. cmake/onnxruntime_framework.cmake [BUILDING on AIX issue]Putting NO AIX check for -Wl,-rpath='$ORIGIN' 5. cmake/onnxruntime_mlas.cmake [BUILDING on AIX issue]POWER10 releated macro/function definition . 6. cmake/onnxruntime_providers_cpu.cmake [BUILDING on AIX issue]Putting NO AIX check for non-supported linker flags like --Xlinker 7. cmake/onnxruntime_unittests.cmake [BUILDING on AIX issue] o Putting NO AIX check for non-supported linker flags like --Xlinker o Adding required libraries for AIX linker under applicatiion like onnxruntime_shared_lib_test ,onnxruntime_logging_apis etc 8. cmake/patches/flatbuffers/flatbuffers.patch [BUILDING on AIX issue] Handling of TypeCode in include/flatbuffers/flatbuffers.h under AIX + clang 9. onnxruntime/contrib_ops/cpu/murmur_hash3.cc [Big endian issue] Byte-Conversion handlling in compute() and getblock() routines 10. onnxruntime/contrib_ops/cpu/quantization/matmul_nbits_impl.cc [Big endian issue] Handling of test failures . Byte swapping for quant_value. 11. onnxruntime/core/framework/tensorprotoutils.cc [Big endian issue] Implementation of SetRawDataInTensorProto , ConvertRawDataInTensorProto . o SetRawDataInTensorProto : Wrapper for set_raw_data(). Calling ConvertRawDataInTensorProto() in big-endian system o ConvertRawDataInTensorProto : function used mainly on big-endian system for byte-swapping of tensor raw_data 12. onnxruntime/core/framework/tensorprotoutils.h [Big endian issue] Declaration of SetRawDataInTensorProto, ConvertRawDataInTensorProto 13. onnxruntime/core/graph/graph.cc [Big endian issue] o Call ConvertRawDataInTensorProto for SPARSE_TENSOR type o Call ConvertRawDataInTensorProto for SaveToOrtFormat 14. onnxruntime/core/mlas/lib/platform.cpp [BUILDING on AIX issue] POWER10 released enablement for AIX 15. onnxruntime/core/mlas/lib/power/qgemm_kernel_power10.cpp [BUILDING on AIX issue]Handling of __vector under AIX+clang 16. onnxruntime/core/mlas/lib/qgemm.h [BUILDING on AIX issue] Adding _AIX flag 17. onnxruntime/core/mlas/lib/qlmul.cpp [BUILDING on AIX issue] Handling of __vector under AIX+clang 18. onnxruntime/core/optimizer/attention_fusion.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 19. onnxruntime/core/optimizer/compute_optimizer/shared_utils.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 20. onnxruntime/core/optimizer/constant_folding.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 21. onnxruntime/core/optimizer/embed_layer_norm_fusion.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 22. onnxruntime/core/optimizer/nchwc_transformer.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 23. onnxruntime/core/optimizer/qdq_transformer/avx2_weight_s8_to_u8.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 24. onnxruntime/core/optimizer/qdq_transformer/qdq_s8_to_u8.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 25. onnxruntime/core/optimizer/qdq_transformer/s8_to_u8.h [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 26. onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_actions.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 27. onnxruntime/core/optimizer/reshape_fusion.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 28. onnxruntime/core/optimizer/stft_decomposition.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 29. onnxruntime/core/optimizer/transpose_optimization/ort_optimizer_api_impl.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 30. onnxruntime/core/platform/path_lib.h [BUILDING on AIX issue] Moving to normal function call, instead of template 31. onnxruntime/core/platform/posix/env.cc [BUILDING on AIX issue]Blocking syscall.h in AIX 32. onnxruntime/core/session/inference_session.cc [Big endian issue] Removing ORT_RETURN_IF_NOT, FLATBUFFERS_LITTLEENDIAN 33. onnxruntime/test/flatbuffers/flatbuffer_utils_test.cc [Big endian issue] Call ConvertRawDataInTensorProto in CreateInitializer and ExternalWriteReadWithLoadInitializers 34. onnxruntime/test/framework/sparse_kernels_test.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 35. onnxruntime/test/framework/tensorutils_test.cc [Big endian issue] Helper method ConvertEndianessForVector and call this from required place. 36. onnxruntime/test/framework/test_tensor_loader.cc o. [BUILDING on AIX issue] Handling of getcwd for AIX o. [Big endian issue] Bytes Swapping in run_external_data_test 37. onnxruntime/test/onnx/main.cc [Big endian issue] including <thread> for AIX 38. onnxruntime/test/onnx/tensorprotoutils.cc [Big endian issue] Bytes swapping in UnpackTensorWithRawData 39. onnxruntime/test/optimizer/graph_transform_test.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 40. onnxruntime/test/optimizer/graph_transform_test_builder.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 41. onnxruntime/test/optimizer/graph_transform_test_builder.h [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 42. onnxruntime/test/optimizer/initializer_test.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 43. onnxruntime/test/optimizer/nchwc_optimizer_test.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 44. onnxruntime/test/providers/base_tester.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 45. onnxruntime/test/providers/cpu/generator/random_test.cc [BUILDING on AIX issue] Adding AIX check in MultinomialGoodCase --------- Co-authored-by: Vamshikrishna Thatikonda <[email protected]>
Thanks @tianleiwu @snnn @liqunfu @yihonglyu @skottmckay @edgchen1 for providing suggestion/help on this issue/PR. |
Thank you @tianleiwu @snnn @liqunfu @yihonglyu @skottmckay @edgchen1 for all the help. |
@tianleiwu |
@ranjitshs, 1.19 was released: https://github.com/microsoft/onnxruntime/releases/tag/v1.19.0 |
@tianleiwu |
Describe the issue
Basically I am observing on AIX which is a big endian platform, following tests are failing, when onnxruntime is built from source .
1: [ FAILED ] GraphTransformationTests.AttentionFusionInt32Test
1: [ FAILED ] GraphTransformationTests.AttentionFusionInt64Test
1: [ FAILED ] GraphTransformationTests.AttentionFusionFloat32Test
The underlying issue seems to be onnxruntime is adding initializers to the graph during tranfsormation of the graph.
When initializers are getting added to the graph. the initializers are containing the raw_data, according to the
current implementation the graph should have little endian data and whenever it processes the data
UnpackTensors takes care of endianess of the machine. The issue is on a Big endian platform the data
is already in the BE and this makes graph to contain BE data instead of LE.
Below is the place where seems to be the actual issue.
In the file : onnxruntime/core/optimizer/attention_fusion.cc
Function: MergeQkvWeights
if (data_type == ONNX_NAMESPACE::TensorProto_DataType_FLOAT) {
const float* q_weight = q_initializer.data();
const float* k_weight = k_initializer.data();
const float* v_weight = v_initializer.data();
std::vector result;
result.reserve(gsl::narrow<size_t>(element_count));
if (is_matmul) {
MergeMatMulWeights(q_weight, k_weight, v_weight, result, hidden_size);
} else {
MergeWeights(q_weight, k_weight, v_weight, result, hidden_size);
}
initializer.set_raw_data(result.data(), gsl::narrow<size_t>(element_count) * sizeof(float)); <=== Here we are setting the raw_data using BE data when run on a Big endian platform.
Also in else block the same issue, initializer is getting formed with BE data and this is getting added to Graph.
One way to fix this issue is ..we need do byte swapping before we set the raw_data for Big Endian platforms.
I tested the byteswapping code before we actually set the raw data and tests are working fine.
To reproduce
Run graphtransformation on a Big Endian platform like AIX on IBM Power.
Urgency
No response
Platform
Other / Unknown
OS Version
AIX
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
latest
ONNX Runtime API
C
Architecture
IBM Power
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: