Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove f16, bf16 from node's evaluate methods v2 #22674

Merged

Conversation

mateusztabaka
Copy link
Contributor

Since this change, most of operation that override Node::evaluate will stop instantiating evaluate methods for f16 and bf16. There are few exceptions - we still keep f16 (and/or bf16) evaluates for the following operations:

  • Ceiling
  • Convert
  • FakeConvert

Primary reason for that is to reduce binary size. The change saves us around 200 KB. The change is transparent to the caller, so you can still evaluate f16/bf16 operations, but internally they'll be executed with f32 precision.

Ticket: CVS-108489

@mateusztabaka mateusztabaka requested a review from a team as a code owner February 6, 2024 09:23
@github-actions github-actions bot added the category: Core OpenVINO Core (aka ngraph) label Feb 6, 2024
Since this change, most of operation that override Node::evaluate will stop
instantiating evaluate methods for f16 and bf16. There are few exceptions -
we still keep f16 (and/or bf16) evaluates for the following operations:
- Ceiling
- Convert
- FakeConvert

Primary reason for that is to reduce binary size. The change saves us around 200 KB.
The change is transparent to the caller, so you can still evaluate f16/bf16 operations,
but internally they'll be executed with f32 precision.

Ticket: CVS-108489
@mateusztabaka mateusztabaka force-pushed the remove_f16_bf16_evaluates_v2 branch from cef8252 to 2c14955 Compare February 6, 2024 13:32
@github-actions github-actions bot added the category: CPP API OpenVINO CPP API bindings label Feb 13, 2024
@mateusztabaka mateusztabaka force-pushed the remove_f16_bf16_evaluates_v2 branch from 2d3c74c to a9a0849 Compare February 14, 2024 13:32
@mateusztabaka mateusztabaka requested a review from praasz February 15, 2024 21:51
@praasz praasz added this to the 2024.1 milestone Feb 19, 2024
#include "ov_ops/type_relaxed.hpp"

const ov::element::TypeVector& ov::util::unsupported_types() {
static const ov::element::TypeVector types{ov::element::f16, ov::element::bf16};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just idea: As I know, some plugins use conversion from i64 to i32. Will it be useful to apply the same approach (fp32 -> fp32) for the mentioned element types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal was to remove the f16, and bf16 from core ops to reduce bin-size but constant folding and shape inference (before apply the plugin conversion) still requires these precisions to do calculations and this is reason to apply convert to f32 without changing the model. In core when operator do calculations on f16, bf16 values are converted to float and then back (element by element)

The i64 and i32 are common types when used for shapes calculations (constant fold and shape inference) and apply conversion will may introduce data copies native support I think is better.

@itikhono itikhono added this pull request to the merge queue Feb 21, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 21, 2024
@itikhono itikhono added this pull request to the merge queue Feb 21, 2024
github-merge-queue bot pushed a commit that referenced this pull request Feb 21, 2024
Since this change, most of operation that override Node::evaluate will
stop instantiating evaluate methods for f16 and bf16. There are few
exceptions - we still keep f16 (and/or bf16) evaluates for the following
operations:
- Ceiling
- Convert
- FakeConvert

Primary reason for that is to reduce binary size. The change saves us
around 200 KB. The change is transparent to the caller, so you can still
evaluate f16/bf16 operations, but internally they'll be executed with
f32 precision.

Ticket: CVS-108489
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 21, 2024
@itikhono itikhono requested review from rkazants and slyalin February 21, 2024 11:24
Copy link
Member

@rkazants rkazants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about reduction at the cost of removing i32 and use i64 for it? And u32->u64? Or other integer types like i8, i16

@itikhono itikhono added this pull request to the merge queue Feb 22, 2024
Merged via the queue into openvinotoolkit:master with commit a4dcf65 Feb 22, 2024
103 checks passed
@mateusztabaka mateusztabaka deleted the remove_f16_bf16_evaluates_v2 branch February 26, 2024 08:26
github-merge-queue bot pushed a commit that referenced this pull request Oct 18, 2024
…ding is omitted (#26756)

Details:
It's a modification of
#22674
f16 LLM (llama was tested) compilation time on ARM is unreasonable huge.
Perf report shows that every ConstantFolding transformation takes
several seconds even if the graph is not modified.
The root cause is util::convert_to_supported_precision call even if
constant folding is skipped.
The suggested fix is to skip util::convert_to_supported_precision call
if folding is not applied.

Tickets:
CVS-152428

---------

Co-authored-by: Aleksandr Voron <[email protected]>
Co-authored-by: Andrii Staikov <[email protected]>
CuriousPanCake added a commit to CuriousPanCake/openvino that referenced this pull request Nov 6, 2024
…ding is omitted (openvinotoolkit#26756)

Details:
It's a modification of
openvinotoolkit#22674
f16 LLM (llama was tested) compilation time on ARM is unreasonable huge.
Perf report shows that every ConstantFolding transformation takes
several seconds even if the graph is not modified.
The root cause is util::convert_to_supported_precision call even if
constant folding is skipped.
The suggested fix is to skip util::convert_to_supported_precision call
if folding is not applied.

Tickets:
CVS-152428

---------

Co-authored-by: Aleksandr Voron <[email protected]>
Co-authored-by: Andrii Staikov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Core OpenVINO Core (aka ngraph) category: CPP API OpenVINO CPP API bindings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants