-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove f16, bf16 from node's evaluate methods v2 #22674
Remove f16, bf16 from node's evaluate methods v2 #22674
Conversation
Since this change, most of operation that override Node::evaluate will stop instantiating evaluate methods for f16 and bf16. There are few exceptions - we still keep f16 (and/or bf16) evaluates for the following operations: - Ceiling - Convert - FakeConvert Primary reason for that is to reduce binary size. The change saves us around 200 KB. The change is transparent to the caller, so you can still evaluate f16/bf16 operations, but internally they'll be executed with f32 precision. Ticket: CVS-108489
cef8252
to
2c14955
Compare
2d3c74c
to
a9a0849
Compare
#include "ov_ops/type_relaxed.hpp" | ||
|
||
const ov::element::TypeVector& ov::util::unsupported_types() { | ||
static const ov::element::TypeVector types{ov::element::f16, ov::element::bf16}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just idea: As I know, some plugins use conversion from i64 to i32. Will it be useful to apply the same approach (fp32 -> fp32) for the mentioned element types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal was to remove the f16, and bf16 from core ops to reduce bin-size but constant folding and shape inference (before apply the plugin conversion) still requires these precisions to do calculations and this is reason to apply convert to f32 without changing the model. In core when operator do calculations on f16, bf16 values are converted to float and then back (element by element)
The i64 and i32 are common types when used for shapes calculations (constant fold and shape inference) and apply conversion will may introduce data copies native support I think is better.
Since this change, most of operation that override Node::evaluate will stop instantiating evaluate methods for f16 and bf16. There are few exceptions - we still keep f16 (and/or bf16) evaluates for the following operations: - Ceiling - Convert - FakeConvert Primary reason for that is to reduce binary size. The change saves us around 200 KB. The change is transparent to the caller, so you can still evaluate f16/bf16 operations, but internally they'll be executed with f32 precision. Ticket: CVS-108489
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about reduction at the cost of removing i32 and use i64 for it? And u32->u64? Or other integer types like i8, i16
…ding is omitted (#26756) Details: It's a modification of #22674 f16 LLM (llama was tested) compilation time on ARM is unreasonable huge. Perf report shows that every ConstantFolding transformation takes several seconds even if the graph is not modified. The root cause is util::convert_to_supported_precision call even if constant folding is skipped. The suggested fix is to skip util::convert_to_supported_precision call if folding is not applied. Tickets: CVS-152428 --------- Co-authored-by: Aleksandr Voron <[email protected]> Co-authored-by: Andrii Staikov <[email protected]>
…ding is omitted (openvinotoolkit#26756) Details: It's a modification of openvinotoolkit#22674 f16 LLM (llama was tested) compilation time on ARM is unreasonable huge. Perf report shows that every ConstantFolding transformation takes several seconds even if the graph is not modified. The root cause is util::convert_to_supported_precision call even if constant folding is skipped. The suggested fix is to skip util::convert_to_supported_precision call if folding is not applied. Tickets: CVS-152428 --------- Co-authored-by: Aleksandr Voron <[email protected]> Co-authored-by: Andrii Staikov <[email protected]>
Since this change, most of operation that override Node::evaluate will stop instantiating evaluate methods for f16 and bf16. There are few exceptions - we still keep f16 (and/or bf16) evaluates for the following operations:
Primary reason for that is to reduce binary size. The change saves us around 200 KB. The change is transparent to the caller, so you can still evaluate f16/bf16 operations, but internally they'll be executed with f32 precision.
Ticket: CVS-108489