-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling unsupported OperandType #36
Comments
@huningxin Should "tensor-quant8-asymc" simply be "tensor-int8"? Does the current naming imply that the content in the tensor should be coupled with quantization data e.g. zero-point? It might be cleaner to just define the data type here independent of the operation (e.g. quantization) that have originated it. |
Yes. It is supposed to be used together with
I notice there are different quantization support ways in native API, e.g. ONNX defines quantization operations, such as QLinearConv, but NNAPI defines quantized tensors. We may need to create a new issue for compatibility investigation among them. WDYT? |
@huningxin My feedback is not about compatibility. If this enum is meant to describe the data type of an operand, then it should not mix with the notion of a process to be applied with that operand. It just makes it more confusing and hard to evolve over time. If an API requires or produces quantized data blocks, naturally it will also require or produce quantization parameters e.g. zero point and scale (for linear quantization). The fact that the quantized data type is int8 or int4 is independent of the quantization process the data type is involved in. Therefore, defining an enum that mixes the two concepts together is not a good design and makes it hard to extend it in the future. |
@wchao1115 , thanks for your feedback. This enum is used by |
@huningxin You can define |
This sounds like a good design. @wchao1115 , could you please open a new issue for this proposal? Thanks. |
I agree. To rephrase the question, as I understand it, do the quantization-parameters (zeropoint/scale) have to be part of the type (fixed statically), or can we let them be dynamically determined? |
Create new issue #44 |
@wchao1115 , do you think we still need this issue? I suppose #44 is only about redesign the quantization, am I correct? |
Yes |
As the discussion in Feb 20 CG meeting, reopen this issue. When hardware doesn't support a data type in a model, the compilation should fail with an exception. The proposal would depend on #46 that changes compilation interface. |
When an API throw an exception, it usually means it hits a condition in which it doesn't expect and one that its immediate caller can't easily recover from e.g. a hardware failure in the middle of an I/O operation etc. This is why an exception is not the same as a return error code, and that a catch statement on call site should be rare. A data type not supported by the underlying hardware is an error code not an exception because there is a defined behavior of what the immediate caller of the API should do to recover from such a condition i.e. compile with a fallback option. To handle this situation, we will need to look at the current |
was this solved by #50? |
Indeed we can close this per #50 (comment), @wchao1115 please feel free to re-open if you think otherwise. |
Current spec defines
OperandType
enumThe
float16
andtensor-float16
are being added #35 .However, as mentioned by @wchao1115 in #26 (comment), there are situations that the selected device doesn't have native support of an
OperandType
. For example, some CPUs may not supporttensor-float16
, some GPUs may not supporttensor-quant8-asymm
and some AI accelerators may not supporttensor-float32
. To allow the app gracefully handle these situations, e.g. select a different device, or use different model with supported operand type, the API should report the unsupportedOperandType
error.Open this issue to explore the definition of unsupported OperandType error and the API behavior to return that error.
Thoughts?
The text was updated successfully, but these errors were encountered: