Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Good First Issue] [Snippets] [ARM]: Enable FakeQuantize tokenization #28508

Open
a-sidorova opened this issue Jan 17, 2025 · 7 comments · May be fixed by #28700
Open

[Good First Issue] [Snippets] [ARM]: Enable FakeQuantize tokenization #28508

a-sidorova opened this issue Jan 17, 2025 · 7 comments · May be fixed by #28700
Assignees
Labels
category: CPU OpenVINO CPU plugin good first issue Good for newcomers platform: arm OpenVINO on ARM / ARM64

Comments

@a-sidorova
Copy link
Contributor

Context

Snippets is a highly specialized JIT (just-in-time) compiler for computational graphs. It provides a scalable approach to operations' fusions and enablement. As a typical compiler, Snippets have frontend (tokenizer), optimizer and backend (generator).

The first of the Snippets pipeline, Tokenization, identifies parts of the initial ov::Model that can be lowered by Snippets efficiently, and tokenizes them into one whole node - Subgraph.
The second step of the pipeline (Optimizer) is applying common and device-specific optimizations to Subgraph and getting lowered representation of tokenized Subgraph.
Finally, the last stage is code emission. The target generator maps every operation in the IR to a binary code emitter JIT Emitter, which is then used to produce a piece of executable code. As a result, we produce an executable that performs calculations described by the initial input ov::Model.

The purpose of this issue is enabling FakeQuantize operation tokenization in Snippets for ARM devices.
Snippets decomposes FakeQuantize into several simple elementwise operations using FakeQuantizeDecomposition pass. This pass is called after the op tokenization into Subgraph.

Prerequisites

Recommended to use ARM CPU based platform for development (e.g. Mac, Raspberry Pi etc). The cross-compilation with an emulator (e.g. QEMU) using is still option: cmake -DCMAKE_TOOLCHAIN_FILE=../cmake/arm64.toolchain.cmake ...

What needs to be done?

  • Firstly, enable tests which are currently disabled on aarch64 platforms. For that need to update this line to enable smoke*Snippets*_FQDecomposition_* tests. Launch tests (how to launch them - please see the section Tests below). There should be failed tests.
  • Support all the missing operations that occurred during the decomposition pass in the ARM64 machine for the code generation.
  • Enable FakeQuantize op tokenization in tokenizer callback in CPU Plugin - update the is_supported_op.
  • Launch tests again - the tests should be green.

Tests

Tests are disabled in default build, so ensure to add -DENABLE_TESTS=ON into cmake command.

GoogleTest is used for testing. CPU functional test target is ov_cpu_func_tests. You can use GoogleTest filter:

./bin/[platform]/[build_type]/ov_cpu_func_tests --gtest_filter="*smoke*Snippets*FQDecomposition*"

Examples

Resources

Contact points

@a-sidorova, @dmitry-gorokhov

@a-sidorova a-sidorova added category: CPU OpenVINO CPU plugin good first issue Good for newcomers platform: arm OpenVINO on ARM / ARM64 labels Jan 17, 2025
@github-project-automation github-project-automation bot moved this to Contributors Needed in Good first issues Jan 17, 2025
@srinjoydutta03
Copy link
Contributor

.take

Copy link
Contributor

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

@a-sidorova a-sidorova moved this from Contributors Needed to Assigned in Good first issues Jan 20, 2025
@srinjoydutta03
Copy link
Contributor

srinjoydutta03 commented Jan 24, 2025

Hi @a-sidorova, I ran the tests and as expected the tests failed. I'm failing to understand the tests completely.

Here is one of the tests for reference:

smoke_Snippets_FQDecomposition_Scalars/FakeQuantizeDecompositionTest.CompareWithRefImpl/IS=(1.3.16.16)_netPRC=f32_D=CPU_IN=f32_OP=Abs_opset1_ON1=Subgraph_ON1=Abs,fakeQuantize_LP=1SH1=[]SH2=[]SH3=[]SH4=[] src/tests/functional/shared_test_classes/src/base/snippets_test_utils.cpp:52: Failure Expected equality of these values: originalLayersNames Which is: "Abs,fakeQuantize" name Which is: "Abs" src/tests/functional/shared_test_classes/src/base/snippets_test_utils.cpp:37: Failure Expected equality of these values: ref_num_nodes Which is: 4 num_nodes Which is: 3 Compiled model contains invalid number of nodes.

  • What I understand, from seeing the log, the fakeQuantize node is being decomposed but not fused with the subgraph. I am also confused why the num_nodes here are 3. While only the "Abs" is recognized and others isn't. So, I would think that one node is for "Abs", while about the other two, is not logged (I would guess they are Maximum and Minimum nodes, from the decomposition pass).
  • As far as supporting missing operations is concerned, on inspecting the decomposition pass here, I found all the operations (Maximum, Minimum, Add, Subtract, Multiply, Round, ConvertSaturation and Divide) already have their corresponding jitters in cpu_generator. I'm not sure which operation is missing during code generation phase.
  • So, if all the operations already have their corresponding jitters, if i include the ov::is_type<ov::op::v0::FakeQuantize>(n) in is_supported_op, in the transformation pipeline, I thought that it should work but it doesn't.

I'm sure I am missing something, please guide me through this. Thanks.

@a-sidorova
Copy link
Contributor Author

@srinjoydutta03 thank you for the questions!

As far as supporting missing operations is concerned, on inspecting the decomposition pass here, I found all the operations (Maximum, Minimum, Add, Subtract, Multiply, Round, ConvertSaturation and Divide) already have their corresponding jitters in cpu_generator. I'm not sure which operation is missing during code generation phase

You're absolutely right - all emitters are already implemented. If some emitter is missed, you will see the following exception from target machine/generator: Check 'jitter != jitters.end()' failed at src/common/snippets/src/lowered/target_machine.cpp:19: Supported precisions set is not available for X operation.. It means that CPU Generator knows nothing about X operation and doesn't know how to compile the code/which precisions are supported. If you don't see this exceptions, CPU Generator supports everything.

Also let me help you with logs of tests:

src/tests/functional/shared_test_classes/src/base/snippets_test_utils.cpp:52: Failure Expected equality of these values: originalLayersNames Which is: "Abs,fakeQuantize" name Which is: "Abs"

It means, the test expects that the execution model (state after ov::Model compilation) will contain Subgraph op which contains Abs and FakeQuantize ops (what was originally tokenized by Snippets). The test shows that currently Subgraph has only Abs and FakeQuantize has not been tokenized. This test part should be fixed by your adding ov::is_type<ov::op::v0::FakeQuantize>(n) in is_supported_op, in the transformation pipeline as you already did 😃

src/tests/functional/shared_test_classes/src/base/snippets_test_utils.cpp:37: Failure Expected equality of these values: ref_num_nodes Which is: 4 num_nodes Which is: 3 Compiled model contains invalid number of nodes.

I believe that this check is more for x64 where we support blocked layouts in CPU Plugin. Please see brief comment. Since MaxPool forces blocked shapes on x64 platforms, x64 platform expect reorders around (we use these reorder ops to change layouts of tensors) -> expected node count is 4 (reorder + MaxPool + reorder + Subgraph). AArch64 forces nothing so reorders ops are missed and expected node count is another. But I've just found that we inserted MaxPool on inputs of model due to old and outdated limitations in Snippets which was removed for a long time ago. So I suggest to set empty vector here instead of vector with MaxPool. Then the tests should expect that after model compilation there will be only one op - Subgraph. So after that you need to replace first numbers in pairs with 1. I believe that after that - these tests will be green on the both platforms: x64 and aarch64.

By the way, I've found the tests legacyFuse here. They are relevant only for x64 platforms: on x64 platforms Conv op can fuse FakeQuantize op on output for better performance. This is not supported on aarc64. So I suggest to hide this whole namespace (with test instance) into #ifdef OPENVINO_ARCH_X86_64 ... #endif since they're needed and valid only on x64 platforms.

If you have more questions, feel free to ask them! 😊

@srinjoydutta03
Copy link
Contributor

srinjoydutta03 commented Jan 27, 2025

Thank you so much for the help :).

I would think for other tests too, the per_channel and per_channel_inputs as well I should set the first parameter to 1 since both these tests also use Reorder and MaxPool ops.

I have enclosed the INSTANTIATE_TEST_SUITE_P under legacyFuse namespace with #ifdef and #endif directives.

On doing so the tests run successfully now, with 6 tests skipped for 16bit floating point precisions.

Image

@a-sidorova
Copy link
Contributor Author

@srinjoydutta03 thank you for the status sharing! Now we're waiting for the PR from you! 😊

@a-sidorova a-sidorova moved this from Assigned to In Review in Good first issues Jan 28, 2025
@a-sidorova
Copy link
Contributor Author

a-sidorova commented Jan 28, 2025

@srinjoydutta03 as for our discussion about the next tasks which can be interesting for you.

At the moment, we have the following ARM-related tasks which already have assignee.. But she has a lot of taken issues without any activity in these tasks:

Just leave please the comment in the interesting for you issue. I will reassign to you! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin good first issue Good for newcomers platform: arm OpenVINO on ARM / ARM64
Projects
Status: In Review
Development

Successfully merging a pull request may close this issue.

2 participants