Skip to content

Commit

Permalink
Add documentation on how to convert QuartzNet model (#4664)
Browse files Browse the repository at this point in the history
* Add documentation on how to convert QuartzNet model (#4422)

* Add documentation on how to convert QuartzNet model

* Apply review feedback

* Small fix

* Apply review feedback

* Apply suggestions from code review

Co-authored-by: Anastasiya Ageeva <[email protected]>

Co-authored-by: Anastasiya Ageeva <[email protected]>

* Add reference to file

Co-authored-by: Anastasiya Ageeva <[email protected]>
  • Loading branch information
mvafin and avladimi authored Mar 9, 2021
1 parent bfe0748 commit 02d2dbd
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Convert PyTorch* QuartzNet to the Intermediate Representation {#openvino_docs_MO_DG_prepare_model_convert_model_onnx_specific_Convert_QuartzNet}

[NeMo project](https://github.com/NVIDIA/NeMo) provides the QuartzNet model.

## Download the Pre-Trained QuartzNet Model

To download the pre-trained model, refer to the [NeMo Speech Models Catalog](https://ngc.nvidia.com/catalog/models/nvidia:nemospeechmodels).
Here are the instructions on how to obtain QuartzNet in ONNX* format.
```python
import nemo
import nemo.collections.asr as nemo_asr

quartznet = nemo_asr.models.ASRConvCTCModel.from_pretrained(model_info='QuartzNet15x5-En')
# Export QuartzNet model to ONNX* format
quartznet.export('qn.onnx')
```
This code produces 3 ONNX* model files: `encoder_qt.onnx`, `decoder_qt.onnx`, `qn.onnx`.
They are `decoder`, `encoder` and a combined `decoder(encoder(x))` models, respectively.

## Convert ONNX* QuartzNet model to IR

If using a combined model:
```sh
./mo.py --input_model <MODEL_DIR>/qt.onnx --input_shape [B,64,X]
```
If using separate models:
```sh
./mo.py --input_model <MODEL_DIR>/encoder_qt.onnx --input_shape [B,64,X]
./mo.py --input_model <MODEL_DIR>/decoder_qt.onnx --input_shape [B,1024,Y]
```

Where shape is determined by the audio file Mel-Spectrogram length: B - batch dimension, X - dimension based on the input length, Y - determined by encoder output, usually `X / 2`.
1 change: 1 addition & 0 deletions docs/doxygen/ie_docs.xml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ limitations under the License.
<tab type="user" title="Convert ONNX* Faster R-CNN Model to the Intermediate Representation" url="@ref openvino_docs_MO_DG_prepare_model_convert_model_onnx_specific_Convert_Faster_RCNN"/>
<tab type="user" title="Convert ONNX* Mask R-CNN Model to the Intermediate Representation" url="@ref openvino_docs_MO_DG_prepare_model_convert_model_onnx_specific_Convert_Mask_RCNN"/>
<tab type="user" title="Converting DLRM ONNX* Model" url="@ref openvino_docs_MO_DG_prepare_model_convert_model_onnx_specific_Convert_DLRM"/>
<tab type="user" title="Convert PyTorch* QuartzNet Model" url="@ref openvino_docs_MO_DG_prepare_model_convert_model_onnx_specific_Convert_QuartzNet"/>
</tab>
<tab type="user" title="Model Optimizations Techniques" url="@ref openvino_docs_MO_DG_prepare_model_Model_Optimization_Techniques"/>
<tab type="user" title="Cutting off Parts of a Model" url="@ref openvino_docs_MO_DG_prepare_model_convert_model_Cutting_Model"/>
Expand Down

0 comments on commit 02d2dbd

Please sign in to comment.