Add documentation on how to convert QuartzNet model (#4664)

* Add documentation on how to convert QuartzNet model (#4422) * Add documentation on how to convert QuartzNet model * Apply review feedback * Small fix * Apply review feedback * Apply suggestions from code review Co-authored-by: Anastasiya Ageeva <[email protected]> Co-authored-by: Anastasiya Ageeva <[email protected]> * Add reference to file Co-authored-by: Anastasiya Ageeva <[email protected]>
openvinotoolkit · Mar 9, 2021 · 02d2dbd · 02d2dbd
1 parent bfe0748
commit 02d2dbd
Show file tree

Hide file tree

Showing 2 changed files with 33 additions and 0 deletions.
diff --git a/docs/MO_DG/prepare_model/convert_model/onnx_specific/Convert_QuartzNet.md b/docs/MO_DG/prepare_model/convert_model/onnx_specific/Convert_QuartzNet.md
@@ -0,0 +1,32 @@
+# Convert PyTorch* QuartzNet to the Intermediate Representation {#openvino_docs_MO_DG_prepare_model_convert_model_onnx_specific_Convert_QuartzNet}
+
+[NeMo project](https://github.com/NVIDIA/NeMo) provides the QuartzNet model.
+
+## Download the Pre-Trained QuartzNet Model
+
+To download the pre-trained model, refer to the [NeMo Speech Models Catalog](https://ngc.nvidia.com/catalog/models/nvidia:nemospeechmodels).
+Here are the instructions on how to obtain QuartzNet in ONNX* format.
+```python
+import nemo
+import nemo.collections.asr as nemo_asr
+
+quartznet = nemo_asr.models.ASRConvCTCModel.from_pretrained(model_info='QuartzNet15x5-En')
+# Export QuartzNet model to ONNX* format
+quartznet.export('qn.onnx')
+```
+This code produces 3 ONNX* model files: `encoder_qt.onnx`, `decoder_qt.onnx`, `qn.onnx`.
+They are `decoder`, `encoder` and a combined `decoder(encoder(x))` models, respectively.
+
+## Convert ONNX* QuartzNet model to IR
+
+If using a combined model:
+```sh
+./mo.py --input_model <MODEL_DIR>/qt.onnx --input_shape [B,64,X]
+```
+If using separate models:
+```sh
+./mo.py --input_model <MODEL_DIR>/encoder_qt.onnx --input_shape [B,64,X]
+./mo.py --input_model <MODEL_DIR>/decoder_qt.onnx --input_shape [B,1024,Y]
+```
+
+Where shape is determined by the audio file Mel-Spectrogram length: B - batch dimension, X - dimension based on the input length, Y - determined by encoder output, usually `X / 2`.
diff --git a/docs/doxygen/ie_docs.xml b/docs/doxygen/ie_docs.xml
@@ -53,6 +53,7 @@ limitations under the License.
                             <tab type="user" title="Convert ONNX* Faster R-CNN Model to the Intermediate Representation" url="@ref openvino_docs_MO_DG_prepare_model_convert_model_onnx_specific_Convert_Faster_RCNN"/>
                             <tab type="user" title="Convert ONNX* Mask R-CNN Model to the Intermediate Representation" url="@ref openvino_docs_MO_DG_prepare_model_convert_model_onnx_specific_Convert_Mask_RCNN"/>
                             <tab type="user" title="Converting DLRM ONNX* Model" url="@ref openvino_docs_MO_DG_prepare_model_convert_model_onnx_specific_Convert_DLRM"/>
+                            <tab type="user" title="Convert PyTorch* QuartzNet Model" url="@ref openvino_docs_MO_DG_prepare_model_convert_model_onnx_specific_Convert_QuartzNet"/>
                         </tab>
                         <tab type="user" title="Model Optimizations Techniques" url="@ref openvino_docs_MO_DG_prepare_model_Model_Optimization_Techniques"/>
                         <tab type="user" title="Cutting off Parts of a Model" url="@ref openvino_docs_MO_DG_prepare_model_convert_model_Cutting_Model"/>