diff --git a/docs/MO_DG/prepare_model/convert_model/onnx_specific/Convert_QuartzNet.md b/docs/MO_DG/prepare_model/convert_model/onnx_specific/Convert_QuartzNet.md new file mode 100644 index 00000000000000..dde3a5688ee7a8 --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/onnx_specific/Convert_QuartzNet.md @@ -0,0 +1,32 @@ +# Convert PyTorch* QuartzNet to the Intermediate Representation {#openvino_docs_MO_DG_prepare_model_convert_model_onnx_specific_Convert_QuartzNet} + +[NeMo project](https://github.com/NVIDIA/NeMo) provides the QuartzNet model. + +## Download the Pre-Trained QuartzNet Model + +To download the pre-trained model, refer to the [NeMo Speech Models Catalog](https://ngc.nvidia.com/catalog/models/nvidia:nemospeechmodels). +Here are the instructions on how to obtain QuartzNet in ONNX* format. +```python +import nemo +import nemo.collections.asr as nemo_asr + +quartznet = nemo_asr.models.ASRConvCTCModel.from_pretrained(model_info='QuartzNet15x5-En') +# Export QuartzNet model to ONNX* format +quartznet.export('qn.onnx') +``` +This code produces 3 ONNX* model files: `encoder_qt.onnx`, `decoder_qt.onnx`, `qn.onnx`. +They are `decoder`, `encoder` and a combined `decoder(encoder(x))` models, respectively. + +## Convert ONNX* QuartzNet model to IR + +If using a combined model: +```sh +./mo.py --input_model /qt.onnx --input_shape [B,64,X] +``` +If using separate models: +```sh +./mo.py --input_model /encoder_qt.onnx --input_shape [B,64,X] +./mo.py --input_model /decoder_qt.onnx --input_shape [B,1024,Y] +``` + +Where shape is determined by the audio file Mel-Spectrogram length: B - batch dimension, X - dimension based on the input length, Y - determined by encoder output, usually `X / 2`. diff --git a/docs/doxygen/ie_docs.xml b/docs/doxygen/ie_docs.xml index b7636009e5dc0f..07d9df15051024 100644 --- a/docs/doxygen/ie_docs.xml +++ b/docs/doxygen/ie_docs.xml @@ -53,6 +53,7 @@ limitations under the License. +