Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big endian issue: Graph Transformation Attention Fusion tests are failing #12921

Closed
tvkai opened this issue Sep 10, 2022 · 30 comments
Closed

Big endian issue: Graph Transformation Attention Fusion tests are failing #12921

tvkai opened this issue Sep 10, 2022 · 30 comments
Labels
model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.

Comments

@tvkai
Copy link
Contributor

tvkai commented Sep 10, 2022

Describe the issue

Basically I am observing on AIX which is a big endian platform, following tests are failing, when onnxruntime is built from source .

1: [ FAILED ] GraphTransformationTests.AttentionFusionInt32Test
1: [ FAILED ] GraphTransformationTests.AttentionFusionInt64Test
1: [ FAILED ] GraphTransformationTests.AttentionFusionFloat32Test

The underlying issue seems to be onnxruntime is adding initializers to the graph during tranfsormation of the graph.
When initializers are getting added to the graph. the initializers are containing the raw_data, according to the
current implementation the graph should have little endian data and whenever it processes the data
UnpackTensors takes care of endianess of the machine. The issue is on a Big endian platform the data
is already in the BE and this makes graph to contain BE data instead of LE.

Below is the place where seems to be the actual issue.

In the file : onnxruntime/core/optimizer/attention_fusion.cc
Function: MergeQkvWeights

if (data_type == ONNX_NAMESPACE::TensorProto_DataType_FLOAT) {
const float* q_weight = q_initializer.data();
const float* k_weight = k_initializer.data();
const float* v_weight = v_initializer.data();
std::vector result;
result.reserve(gsl::narrow<size_t>(element_count));
if (is_matmul) {
MergeMatMulWeights(q_weight, k_weight, v_weight, result, hidden_size);
} else {
MergeWeights(q_weight, k_weight, v_weight, result, hidden_size);
}
initializer.set_raw_data(result.data(), gsl::narrow<size_t>(element_count) * sizeof(float)); <=== Here we are setting the raw_data using BE data when run on a Big endian platform.

Also in else block the same issue, initializer is getting formed with BE data and this is getting added to Graph.

One way to fix this issue is ..we need do byte swapping before we set the raw_data for Big Endian platforms.
I tested the byteswapping code before we actually set the raw data and tests are working fine.

To reproduce

Run graphtransformation on a Big Endian platform like AIX on IBM Power.

Urgency

No response

Platform

Other / Unknown

OS Version

AIX

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

latest

ONNX Runtime API

C

Architecture

IBM Power

Execution Provider

Default CPU

Execution Provider Library Version

No response

@madurais
Copy link
Contributor

The raw data initilaizers which are read from onnx or ort files are in LE format. They should be modified to BE on BE systems. But the initializers which are input to any node, are expected to be in LE before running inference in RunInferencing() in graph.cc as onnx code expects the data in LE.

In above code it is Fusion test and hence the initializer raw data has to be converted to be in BE, either before merge, or after merge.

@tvkai
Copy link
Contributor Author

tvkai commented Sep 12, 2022

@snnn @tianleiwu @skottmckay @edgchen1 can you let us know your comments on this issue.

@tianleiwu
Copy link
Contributor

tianleiwu commented Sep 12, 2022

@tvkai. Thanks for reporting the issue and identifying the root cause. Since you have a fix, could you please submit a pull request?

@snnn, could we add a build pipeline on Big Endian platform? We can run occasionally (like before each release) to avoid similar issue.

@tvkai
Copy link
Contributor Author

tvkai commented Sep 14, 2022

@tianleiwu, @snnn thanks a lot for giving your view on this issue, actually when we are testing on AIX, we are observing
multiple other places in the code which are adding initializers to the graph with raw_data. We may need to fix in all those places where the byte swapping is needed.

Basically following places where the initializers are getting added :

  1. onnxruntime/core/optimizer/attention_fusion.cc

    initializer is getting added to the graph in the function MergeQkvWeights()

  2. onnxruntime/core/optimizer/constant_folding.cc

shape_constant initializer is getting added to the graph in the function ConstantFoldShapeNode().

  1. onnxruntime/core/optimizer/embed_layer_norm_fusion.cc

initializer is getting added to the graph in the function ExtractEmbedding()

  1. onnxruntime/core/optimizer/transpose_optimizer/transpose_optimizer.c

Initializer is getting added to the graph .

  1. onnxruntime/core/optimizer/transpose_optimizer/optimizer_api_impl.cc

ApiGraph::AddInitializer is getting added to the graph

Above are the actual core runtime areas, similar to this there are some places the in the test folder as well.
We may need a common way to convert and add the initializers to the graph.

To avoid future misses we might need a better way to handle the big endian platform support.

One approach to handle this is probably for big endian systems convert the Graph’s raw_data to Big Endian
values when we load them from a model file and existing UnpackTensors can probably skip the conversion
( skip ReadLittleEndian()) But one issue with this approach is RunInferencing()
( schema->GetTypeAndShapeInferenceFunction()(*this) ) code that invokes the ONNX name space routines
which always expects raw data to be in Little Endian Format. However the advantage with this approach in
onnxruntime is less number of changes all over the code and in future we may not miss any of such above
places.

@tvkai
Copy link
Contributor Author

tvkai commented Sep 15, 2022

@snnn @tianleiwu @skottmckay @edgchen1 any suggestions here ?.

@madurais
Copy link
Contributor

We tried the above approach mentioned by @tvkai ie) convert the Graph’s raw_data to Big Endian
values when we load them from a model file and existing UnpackTensors can probably skip the conversion
( skip ReadLittleEndian()). We also changed the code to convert the input initializers for a node back to Little Endian before calling RunInferencing(), so that Input initializers are in Little Endian as expected by onnx code.

We could see all onnxruntime_test_all test suits are passing.

@skottmckay
Copy link
Contributor

@madurais are you able to create a PR with the changes so they can be reviewed?

@madurais
Copy link
Contributor

Yes Skott. I tried the changes. It works. Making some more changes to avoid some errors. Will create a PR once done by next week.

@tvkai
Copy link
Contributor Author

tvkai commented May 15, 2024

@skottmckay @tianleiwu @snnn @edgchen1

We tried both approaches on latest main.

  1. Whenever we are setting raw_data in the graph, convert the data to Big endian on Big endian platform. We identified all the potential places ( along with core, test code as well ) for this change. We see all tests ( c & c++ ) are passing.
  2. The other approach of converting the Graph's raw_data to Big endian values when we load them from a model file and existing unpack tensors can skip the conversion. But we are observing one test
    ( AttentionTest.Attention_Mask1D_Fp32_B2_S64 ) failure.

Any suggestions on which approach we should be focusing on ?.

@ranjitshs
Copy link
Contributor

we can raise PR for both approaches for review.

@tianleiwu
Copy link
Contributor

tianleiwu commented May 15, 2024

@skottmckay @tianleiwu @snnn @edgchen1

We tried both approaches on latest main.

  1. Whenever we are setting raw_data in the graph, convert the data to Big endian on Big endian platform. We identified all the potential places ( along with core, test code as well ) for this change. We see all tests ( c & c++ ) are passing.
  2. The other approach of converting the Graph's raw_data to Big endian values when we load them from a model file and existing unpack tensors can skip the conversion. But we are observing one test
    ( AttentionTest.Attention_Mask1D_Fp32_B2_S64 ) failure.

Any suggestions on which approach we should be focusing on ?.

#2 looks good to me.

For these two approaches, I suggest to run a manual test to save the optimized model to disk using session options, and load it again using ORT to see whether the optimized model is still working.

Regarding to AttentionTest.Attention_Mask1D_Fp32_B2_S64, you can adjust tolerance threshold since it uses random inputs (so sometime the difference might exceed threshold). It might not be related to Big endian (otherwise, similar tests like Attention_Mask2D_Fp32_B2_S32 and Attention_NoMask_Fp16 will also fail).

@ranjitshs
Copy link
Contributor

@tianleiwu
Thanks for the reply and suggestion.

  1. I am not able to find C/C++ API using which I can save the model to disk . Could you please guide and tell us the reference for API usage ?

  2. Regarding AttentionTest.Attention_Mask1D_Fp32_B2_S64, I have modified the tolerance threshold from 0.002 to 0.005. Using this value also, it's failing.

With 0.002f, 27834 entries of were mismatched , but
with 0.005f , 27337 entries are mismatching.
Total entires in output array is 98304.

@snnn snnn added the model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. label May 16, 2024
@tianleiwu
Copy link
Contributor

  1. I am not able to find C/C++ API using which I can save the model to disk . Could you please guide and tell us the reference for API usage ?
  2. Regarding AttentionTest.Attention_Mask1D_Fp32_B2_S64, I have modified the tolerance threshold from 0.002 to 0.005. Using this value also, it's failing.

@ranjitshs,

An example of C++ API (there is another example for C API in the page):
https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html#c-api-example-2

  1. Could you try change one line of the test code of Attention_Mask1D_Fp32_B2_S64 from
mask_index_data.push_back(i == 0 ? sequence_length : (sequence_length / 2));

to

mask_index_data.push_back(sequence_length);

That could make the result more deterministic.

@ranjitshs
Copy link
Contributor

@tianleiwu
Thanks for link and code change suggestion

  1. I am able to save the optimized model using SetOptimizedModelFilePath() and reload it properly.

  2. The Attention_Mask1D_Fp32_B2_S64 test is also passing now with the tolerance threshold value 0.006 , along with the above suggested change 'mask_index_data.push_back(sequence_length)'. So to make it pass in AIX , I will put these two change in #ifdef flag.

We are trying to evaluate performance of the two approaches .

@ranjitshs
Copy link
Contributor

@skottmckay @tianleiwu @snnn @edgchen1
For enabling AIX and 2nd approach of this issue, there are around 28 changed files with 413 additions and 53 deletions.
As per PR guide-line, changes are big , So could you suggest the better way to raise PR
or is it fine to raise these changes in single PR.

@tianleiwu
Copy link
Contributor

tianleiwu commented Jun 7, 2024

@ranjitshs, please use one PR for the selected approach (you can choose one based on your judgement), optionally another PR for the other approach if you do not know which one is better. It's fine for big changes. Thanks.

@ranjitshs
Copy link
Contributor

@skottmckay @tianleiwu @snnn @edgchen1
Based on our (@tvkai , @ranjitshs ) understanding , we are going ahead with approach 1. We have designed a wrapper utility API SetRawDataInTensorProto for handling big-endian issue and replaced with this wrapper function wherever onnxrunime is setting the raw data in tensor proto. Changes are in #21133 along with enablement for AIX.
Please have a look and suggest if any better approach for our changes.

@ranjitshs
Copy link
Contributor

@skottmckay @tianleiwu @snnn @edgchen1
Hi All,
May I know the next step/process on this issue.How we will take if forward further. Please let me know.

@snnn
Copy link
Member

snnn commented Jul 2, 2024

@yufenglee, the PR needs your review.

@tianleiwu
Copy link
Contributor

The main challenge is to add an Azure CI pipeline that could test big endian. Otherwise, the code is not validated, and we cannot prevent regression in the future.

One possible way to run docker like https://til.simonwillison.net/docker/emulate-s390x-with-qemu. That might need some additional change to enable such test pipeline.

@ranjitshs
Copy link
Contributor

@tianleiwu
is it possible to integrate AIX OS in the Azure CI pipeline ?

@tianleiwu
Copy link
Contributor

@ranjitshs, Since we do no have AIX in CI pipeline, it has to use QEMu. I suggest to try some docker like https://github.com/WilliamHeaven/docker-aix to see whether it could work. @mszhanyi, could you help setup a pipeline?

@ranjitshs
Copy link
Contributor

@tianleiwu
So if we provide a dedicated AIX System for CI activity, is it possible to integrate this system under CI pipeline ?

Also, may I know when is next release ?

@tianleiwu
Copy link
Contributor

@ranjitshs, I have no experience to setup such CI pipeline, and there is little information in web for github CI using AIX.

Even without integrating to CI pipeline, it is easy to schedule daily build by yourself to make sure the main branch is good for AIX build. That could make sure build issue is found and resolved quickly.

The next release 1.19 is tentatively feature complete by July 19, release branch freeze by end of July, and release at mid of Aug.

@ranjitshs
Copy link
Contributor

@tianleiwu
Yes. We do have local CI setup which is configured to run the main branch along with AIX patches .

Can we fast track the review process ? Generally, how much time will it take to complete the review process

tianleiwu pushed a commit that referenced this issue Jul 17, 2024
…dian platform. (#21133)

### Description
Enablement of onnxruntime for AIX and fixing issues related to
big-endian platform.

### Motivation and Context
changes in this PR contains:
1. Enablement code for building onnxruntime on AIX operating system.
2. while testing the build on AIX, we found issues related to big endian
platform . More details about few of those issues can be found in [Big
endian issue: Graph Transformation Attention Fusion tests are failing
#12921](#12921)

Below are list of files and the description about the change.
1.	cmake/CMakeLists.txt
[BUILDING on AIX issue] check for "IBMClang" is added for handling
-Wno-unused-parameter
2.	cmake/external/onnxruntime_external_deps.cmake
[BUILDING on AIX issue]Enabling gtest_disable_pthreads for AIX
3.	cmake/onnxruntime.cmake
[BUILDING on AIX issue]
o Blocking codes for AIX which generates generated_source.c and further
requires some symbol files.
o	Putting NO AIX check for non-supported linker flags like --Xlinker
o	iconv linking
4.	cmake/onnxruntime_framework.cmake
[BUILDING on AIX issue]Putting NO AIX check for -Wl,-rpath='$ORIGIN'
5.	cmake/onnxruntime_mlas.cmake
[BUILDING on AIX issue]POWER10 releated macro/function definition .
6.	cmake/onnxruntime_providers_cpu.cmake
[BUILDING on AIX issue]Putting NO AIX check for non-supported linker
flags like --Xlinker
7.	cmake/onnxruntime_unittests.cmake
[BUILDING on AIX issue]
o	Putting NO AIX check for non-supported linker flags like --Xlinker
o Adding required libraries for AIX linker under applicatiion like
onnxruntime_shared_lib_test ,onnxruntime_logging_apis etc
8.	cmake/patches/flatbuffers/flatbuffers.patch
[BUILDING on AIX issue] Handling of TypeCode in
include/flatbuffers/flatbuffers.h under AIX + clang
9.	onnxruntime/contrib_ops/cpu/murmur_hash3.cc
[Big endian issue] Byte-Conversion handlling in compute() and getblock()
routines
10.	onnxruntime/contrib_ops/cpu/quantization/matmul_nbits_impl.cc
[Big endian issue] Handling of test failures . Byte swapping for
quant_value.
11.	onnxruntime/core/framework/tensorprotoutils.cc
[Big endian issue]
Implementation of SetRawDataInTensorProto , ConvertRawDataInTensorProto
.
o SetRawDataInTensorProto : Wrapper for set_raw_data(). Calling
ConvertRawDataInTensorProto() in big-endian system
o ConvertRawDataInTensorProto : function used mainly on big-endian
system for byte-swapping of tensor raw_data
12.	onnxruntime/core/framework/tensorprotoutils.h
[Big endian issue]
Declaration of SetRawDataInTensorProto,  ConvertRawDataInTensorProto
13.	onnxruntime/core/graph/graph.cc
[Big endian issue]
 o	Call ConvertRawDataInTensorProto for SPARSE_TENSOR type
 o	Call ConvertRawDataInTensorProto for SaveToOrtFormat
14.	onnxruntime/core/mlas/lib/platform.cpp
[BUILDING on AIX issue] POWER10 released enablement for AIX
15.	onnxruntime/core/mlas/lib/power/qgemm_kernel_power10.cpp
[BUILDING on AIX issue]Handling of __vector under AIX+clang
16.	onnxruntime/core/mlas/lib/qgemm.h
[BUILDING on AIX issue] Adding _AIX flag
17.	onnxruntime/core/mlas/lib/qlmul.cpp
[BUILDING on AIX issue] Handling of __vector under AIX+clang
18.  onnxruntime/core/optimizer/attention_fusion.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
19.  onnxruntime/core/optimizer/compute_optimizer/shared_utils.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
20.  onnxruntime/core/optimizer/constant_folding.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
21.  onnxruntime/core/optimizer/embed_layer_norm_fusion.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
22.  onnxruntime/core/optimizer/nchwc_transformer.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
23.  onnxruntime/core/optimizer/qdq_transformer/avx2_weight_s8_to_u8.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
24.  onnxruntime/core/optimizer/qdq_transformer/qdq_s8_to_u8.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
25.  onnxruntime/core/optimizer/qdq_transformer/s8_to_u8.h
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
26.
onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_actions.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
27.  onnxruntime/core/optimizer/reshape_fusion.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
28.  onnxruntime/core/optimizer/stft_decomposition.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
29.
onnxruntime/core/optimizer/transpose_optimization/ort_optimizer_api_impl.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
30.	onnxruntime/core/platform/path_lib.h
[BUILDING on AIX issue] Moving to normal function call, instead of
template
31.	onnxruntime/core/platform/posix/env.cc
[BUILDING on AIX issue]Blocking syscall.h in AIX
32.	onnxruntime/core/session/inference_session.cc
[Big endian issue] Removing ORT_RETURN_IF_NOT, FLATBUFFERS_LITTLEENDIAN
33.	onnxruntime/test/flatbuffers/flatbuffer_utils_test.cc
[Big endian issue] Call ConvertRawDataInTensorProto in CreateInitializer
and ExternalWriteReadWithLoadInitializers
34.	onnxruntime/test/framework/sparse_kernels_test.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
35.	onnxruntime/test/framework/tensorutils_test.cc
[Big endian issue] Helper method ConvertEndianessForVector and call this
from required place.
36.	onnxruntime/test/framework/test_tensor_loader.cc
o.  [BUILDING on AIX issue] Handling of getcwd for AIX
o.  [Big endian issue]  Bytes Swapping in run_external_data_test
37.	onnxruntime/test/onnx/main.cc
[Big endian issue] including <thread> for AIX
38.	onnxruntime/test/onnx/tensorprotoutils.cc
[Big endian issue]  Bytes swapping in UnpackTensorWithRawData
39.	onnxruntime/test/optimizer/graph_transform_test.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
40.	onnxruntime/test/optimizer/graph_transform_test_builder.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
41.	onnxruntime/test/optimizer/graph_transform_test_builder.h
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
42.	onnxruntime/test/optimizer/initializer_test.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
43.	onnxruntime/test/optimizer/nchwc_optimizer_test.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
44.	onnxruntime/test/providers/base_tester.cc
[Big endian issue] Use util function SetRawDataInTensorProto, instead of
set_raw_data
45.	onnxruntime/test/providers/cpu/generator/random_test.cc
[BUILDING on AIX issue]  Adding AIX check in MultinomialGoodCase

---------

Co-authored-by: Vamshikrishna Thatikonda <[email protected]>
@ranjitshs
Copy link
Contributor

Thanks @tianleiwu @snnn @liqunfu @yihonglyu @skottmckay @edgchen1 for providing suggestion/help on this issue/PR.
We will monitor local CI setup and create issue if we see any build/tests failure in main branch.

@tvkai
Copy link
Contributor Author

tvkai commented Jul 18, 2024

Thank you @tianleiwu @snnn @liqunfu @yihonglyu @skottmckay @edgchen1 for all the help.

@tvkai tvkai closed this as completed Jul 18, 2024
@ranjitshs
Copy link
Contributor

@tianleiwu
Hi ,
I see branch 1.19.0 branch is forked. Any idea about the release date ?

@tianleiwu
Copy link
Contributor

@ranjitshs
Copy link
Contributor

@tianleiwu
Thanks for info. I just noticed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.
Projects
None yet
Development

No branches or pull requests

6 participants