[Good First Issue] [Snippets] [ARM]: Enable FakeQuantize tokenization #28508

a-sidorova · 2025-01-17T08:12:59Z

Context

Snippets is a highly specialized JIT (just-in-time) compiler for computational graphs. It provides a scalable approach to operations' fusions and enablement. As a typical compiler, Snippets have frontend (tokenizer), optimizer and backend (generator).

The first of the Snippets pipeline, Tokenization, identifies parts of the initial ov::Model that can be lowered by Snippets efficiently, and tokenizes them into one whole node - Subgraph.
The second step of the pipeline (Optimizer) is applying common and device-specific optimizations to Subgraph and getting lowered representation of tokenized Subgraph.
Finally, the last stage is code emission. The target generator maps every operation in the IR to a binary code emitter JIT Emitter, which is then used to produce a piece of executable code. As a result, we produce an executable that performs calculations described by the initial input ov::Model.

The purpose of this issue is enabling FakeQuantize operation tokenization in Snippets for ARM devices.
Snippets decomposes FakeQuantize into several simple elementwise operations using FakeQuantizeDecomposition pass. This pass is called after the op tokenization into Subgraph.

Prerequisites

Recommended to use ARM CPU based platform for development (e.g. Mac, Raspberry Pi etc). The cross-compilation with an emulator (e.g. QEMU) using is still option: cmake -DCMAKE_TOOLCHAIN_FILE=../cmake/arm64.toolchain.cmake ...

What needs to be done?

Firstly, enable tests which are currently disabled on aarch64 platforms. For that need to update this line to enable smoke*Snippets*_FQDecomposition_* tests. Launch tests (how to launch them - please see the section Tests below). There should be failed tests.
Support all the missing operations that occurred during the decomposition pass in the ARM64 machine for the code generation.
Enable FakeQuantize op tokenization in tokenizer callback in CPU Plugin - update the is_supported_op.
Launch tests again - the tests should be green.

Tests

Tests are disabled in default build, so ensure to add -DENABLE_TESTS=ON into cmake command.

GoogleTest is used for testing. CPU functional test target is ov_cpu_func_tests. You can use GoogleTest filter:

./bin/[platform]/[build_type]/ov_cpu_func_tests --gtest_filter="*smoke*Snippets*FQDecomposition*"

Examples

Enabled Ceiling and FloorMod tokenization: [Snippets][AArch64] Enabled Ceiling and FloorMod tokenization #28092
CPU Target machine for x64 platforms: link.

Resources

Contribution guide - start here!
What is OpenVINO?
Snippets - JIT compiler for computational graphs
CPU plugin JIT emitters
Blog post on contributing to OpenVINO
User documentation
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers
How to link your Pull Request to an issue

Contact points

@a-sidorova, @dmitry-gorokhov

The text was updated successfully, but these errors were encountered:

srinjoydutta03 · 2025-01-18T20:30:39Z

.take

github-actions · 2025-01-18T20:31:00Z

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

srinjoydutta03 · 2025-01-24T18:46:57Z

Hi @a-sidorova, I ran the tests and as expected the tests failed. I'm failing to understand the tests completely.

Here is one of the tests for reference:

smoke_Snippets_FQDecomposition_Scalars/FakeQuantizeDecompositionTest.CompareWithRefImpl/IS=(1.3.16.16)_netPRC=f32_D=CPU_IN=f32_OP=Abs_opset1_ON1=Subgraph_ON1=Abs,fakeQuantize_LP=1SH1=[]SH2=[]SH3=[]SH4=[] src/tests/functional/shared_test_classes/src/base/snippets_test_utils.cpp:52: Failure Expected equality of these values: originalLayersNames Which is: "Abs,fakeQuantize" name Which is: "Abs" src/tests/functional/shared_test_classes/src/base/snippets_test_utils.cpp:37: Failure Expected equality of these values: ref_num_nodes Which is: 4 num_nodes Which is: 3 Compiled model contains invalid number of nodes.

What I understand, from seeing the log, the fakeQuantize node is being decomposed but not fused with the subgraph. I am also confused why the num_nodes here are 3. While only the "Abs" is recognized and others isn't. So, I would think that one node is for "Abs", while about the other two, is not logged (I would guess they are Maximum and Minimum nodes, from the decomposition pass).
As far as supporting missing operations is concerned, on inspecting the decomposition pass here, I found all the operations (Maximum, Minimum, Add, Subtract, Multiply, Round, ConvertSaturation and Divide) already have their corresponding jitters in cpu_generator. I'm not sure which operation is missing during code generation phase.
So, if all the operations already have their corresponding jitters, if i include the ov::is_type<ov::op::v0::FakeQuantize>(n) in is_supported_op, in the transformation pipeline, I thought that it should work but it doesn't.

I'm sure I am missing something, please guide me through this. Thanks.

a-sidorova · 2025-01-27T06:37:55Z

@srinjoydutta03 thank you for the questions!

As far as supporting missing operations is concerned, on inspecting the decomposition pass here, I found all the operations (Maximum, Minimum, Add, Subtract, Multiply, Round, ConvertSaturation and Divide) already have their corresponding jitters in cpu_generator. I'm not sure which operation is missing during code generation phase

You're absolutely right - all emitters are already implemented. If some emitter is missed, you will see the following exception from target machine/generator: Check 'jitter != jitters.end()' failed at src/common/snippets/src/lowered/target_machine.cpp:19: Supported precisions set is not available for X operation.. It means that CPU Generator knows nothing about X operation and doesn't know how to compile the code/which precisions are supported. If you don't see this exceptions, CPU Generator supports everything.

Also let me help you with logs of tests:

src/tests/functional/shared_test_classes/src/base/snippets_test_utils.cpp:52: Failure Expected equality of these values: originalLayersNames Which is: "Abs,fakeQuantize" name Which is: "Abs"

It means, the test expects that the execution model (state after ov::Model compilation) will contain Subgraph op which contains Abs and FakeQuantize ops (what was originally tokenized by Snippets). The test shows that currently Subgraph has only Abs and FakeQuantize has not been tokenized. This test part should be fixed by your adding ov::is_type<ov::op::v0::FakeQuantize>(n) in is_supported_op, in the transformation pipeline as you already did 😃

src/tests/functional/shared_test_classes/src/base/snippets_test_utils.cpp:37: Failure Expected equality of these values: ref_num_nodes Which is: 4 num_nodes Which is: 3 Compiled model contains invalid number of nodes.

I believe that this check is more for x64 where we support blocked layouts in CPU Plugin. Please see brief comment. Since MaxPool forces blocked shapes on x64 platforms, x64 platform expect reorders around (we use these reorder ops to change layouts of tensors) -> expected node count is 4 (reorder + MaxPool + reorder + Subgraph). AArch64 forces nothing so reorders ops are missed and expected node count is another. But I've just found that we inserted MaxPool on inputs of model due to old and outdated limitations in Snippets which was removed for a long time ago. So I suggest to set empty vector here instead of vector with MaxPool. Then the tests should expect that after model compilation there will be only one op - Subgraph. So after that you need to replace first numbers in pairs with 1. I believe that after that - these tests will be green on the both platforms: x64 and aarch64.

By the way, I've found the tests legacyFuse here. They are relevant only for x64 platforms: on x64 platforms Conv op can fuse FakeQuantize op on output for better performance. This is not supported on aarc64. So I suggest to hide this whole namespace (with test instance) into #ifdef OPENVINO_ARCH_X86_64 ... #endif since they're needed and valid only on x64 platforms.

If you have more questions, feel free to ask them! 😊

srinjoydutta03 · 2025-01-27T11:26:19Z

Thank you so much for the help :).

I would think for other tests too, the per_channel and per_channel_inputs as well I should set the first parameter to 1 since both these tests also use Reorder and MaxPool ops.

I have enclosed the INSTANTIATE_TEST_SUITE_P under legacyFuse namespace with #ifdef and #endif directives.

On doing so the tests run successfully now, with 6 tests skipped for 16bit floating point precisions.

a-sidorova · 2025-01-27T14:49:18Z

@srinjoydutta03 thank you for the status sharing! Now we're waiting for the PR from you! 😊

a-sidorova · 2025-01-28T10:09:45Z

@srinjoydutta03 as for our discussion about the next tasks which can be interesting for you.

At the moment, we have the following ARM-related tasks which already have assignee.. But she has a lot of taken issues without any activity in these tasks:

[Good First Issue] [ARM]: Implement CPU plugin just-in-time emitter for SoftPlus operation #24109
[Good First Issue] [ARM] [Debug Caps] : Implement register prints in just-in-time Kernels #27715

Just leave please the comment in the interesting for you issue. I will reassign to you! 😊

a-sidorova added category: CPU OpenVINO CPU plugin good first issue Good for newcomers platform: arm OpenVINO on ARM / ARM64 labels Jan 17, 2025

a-sidorova added this to Good first issues Jan 17, 2025

github-project-automation bot moved this to Contributors Needed in Good first issues Jan 17, 2025

github-actions bot assigned srinjoydutta03 Jan 18, 2025

a-sidorova moved this from Contributors Needed to Assigned in Good first issues Jan 20, 2025

srinjoydutta03 linked a pull request Jan 27, 2025 that will close this issue

[CPU][ARM64] Enable FakeQuantize tokenization and decomposition #28700

Open

a-sidorova moved this from Assigned to In Review in Good first issues Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Good First Issue] [Snippets] [ARM]: Enable FakeQuantize tokenization #28508

[Good First Issue] [Snippets] [ARM]: Enable FakeQuantize tokenization #28508

a-sidorova commented Jan 17, 2025

srinjoydutta03 commented Jan 18, 2025

github-actions bot commented Jan 18, 2025

srinjoydutta03 commented Jan 24, 2025 •

edited

Loading

a-sidorova commented Jan 27, 2025

srinjoydutta03 commented Jan 27, 2025 •

edited

Loading

a-sidorova commented Jan 27, 2025

a-sidorova commented Jan 28, 2025 •

edited

Loading

[Good First Issue] [Snippets] [ARM]: Enable FakeQuantize tokenization #28508

[Good First Issue] [Snippets] [ARM]: Enable FakeQuantize tokenization #28508

Comments

a-sidorova commented Jan 17, 2025

Context

Prerequisites

What needs to be done?

Tests

Examples

Resources

Contact points

srinjoydutta03 commented Jan 18, 2025

github-actions bot commented Jan 18, 2025

srinjoydutta03 commented Jan 24, 2025 • edited Loading

a-sidorova commented Jan 27, 2025

srinjoydutta03 commented Jan 27, 2025 • edited Loading

a-sidorova commented Jan 27, 2025

a-sidorova commented Jan 28, 2025 • edited Loading

srinjoydutta03 commented Jan 24, 2025 •

edited

Loading

srinjoydutta03 commented Jan 27, 2025 •

edited

Loading

a-sidorova commented Jan 28, 2025 •

edited

Loading