Skip to content

Latest commit

 

History

History
33 lines (16 loc) · 1.63 KB

File metadata and controls

33 lines (16 loc) · 1.63 KB

Trainable Quantizers Infrastructure

The trainable infrastructure is a module containing quantization abstraction and quantizers for hardware-oriented model optimization tools such as the Model Compression Toolkit (MCT).

It provides the required abstraction for trainable quantization methods such as quantization-aware training.

It utilizes the Inferable Quantizers Infrastructure provided by the MCT Quantizers package, which proposes the required abstraction for emulating inference-time quantization.

High level description

For each layer, we use a "Quantization Wrapper" to wrap the layer's weight quantizers, and an "Activation Quantization Holder" to hold the activation quantizers. Both components are provided by the MCT Quantizers package. We can choose the quantizers and all the quantization information for each layer by initializing the weights_quantizer and activation_quantizer API.

Notice that the quantization wrapper, holder and the quantizers are implemented per framework.

Quantizers

The quantizers in this module are built upon the "Inferable Quantizer" abstraction (from MCT Quantizers), and define the "Trainable Quantizer" framework, which contains learnable quantization parameters that can be optimized during training.

Details and Examples

More details and "how to" examples for TensorFlow can be found in:

Trainable quantizers for TensorFlow

And for PyTorch:

Trainable quantizers for PyTorch