Skip to content

Commit

Permalink
Support TF 2.15
Browse files Browse the repository at this point in the history
  • Loading branch information
elad-c committed Mar 21, 2024
1 parent f30f846 commit 400fcdd
Showing 1 changed file with 6 additions and 5 deletions.
11 changes: 6 additions & 5 deletions FAQ.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# FAQ

**Table of Contents:**

1. [Why does the size of the quantized model remain the same as the original model size?](#1-why-does-the-size-of-the-quantized-model-remain-the-same-as-the-original-model-size)
2. [Why does loading a quantized exported model from a file fail?](#2-why-does-loading-a-quantized-exported-model-from-a-file-fail)
3. [Why am I getting a torch.fx error?](#3-why-am-i-getting-a-torchfx-error)


## 1. Why does the size of the quantized model remain the same as the original model size?
### 1. Why does the size of the quantized model remain the same as the original model size?

MCT performs a process known as *fake quantization*, wherein the model's weights and activations are still represented in a floating-point
format but are quantized to represent a maximum of 2^N unique values (for N-bit cases).
Expand All @@ -18,13 +19,13 @@ Note that the IMX500 converter accepts the "fake quantization" model and support
For more information and an implementation example, check out the [INT8 TFLite export tutorial](https://github.com/sony/model_optimization/blob/main/tutorials/notebooks/keras/export/example_keras_export.ipynb)


## 2. Why does loading a quantized exported model from a file fail?
### 2. Why does loading a quantized exported model from a file fail?

The models MCT exports contain QuantizationWrappers and Quantizer objects that define and quantize the model at inference.
These objects are custom layers and layer wrappers created by MCT (defined in an external repository: [MCTQ](https://github.com/sony/mct_quantizers)),
and thus, MCT offers an API for loading these models from a file, depending on the framework.

### Keras
#### Keras

Keras models can be loaded with the following function:
```python
Expand All @@ -33,15 +34,15 @@ import model_compression_toolkit as mct
quantized_model = mct.keras_load_quantized_model('my_model.keras')
```

### PyTorch
#### PyTorch

PyTorch models can be exported as onnx models. An example of loading a saved onnx model can be found [here](https://sony.github.io/model_optimization/api/api_docs/modules/exporter.html#use-exported-model-for-inference).

*Note:* Running inference on an ONNX model in the `onnxruntime` package has a high latency.
Inference on the target platform (e.g. the IMX500) is not affected by this latency.


## 3. Why am I getting a torch.fx error?
### 3. Why am I getting a torch.fx error?

When quantizing a PyTorch model, MCT's initial step involves converting the model into a graph representation using `torch.fx`.
However, `torch.fx` comes with certain common limitations, with the primary one being its requirement for the computational graph to remain static.
Expand Down

0 comments on commit 400fcdd

Please sign in to comment.