Skip to content
Alexander Veysov edited this page Dec 21, 2021 · 9 revisions
  • Are mobile / edge / ARM supported?

    • ONNX runtime should have mobile / edge / ARM builds (16 kHz sampling rate only for now)

    • PyTorch has lite builds for mobile

    • According to users, running ONNX runtime on ARM is easier than PyTorch

    • Also according to users on Android:

      image

    • On Linux x86_64:

      image

  • Are sampling rates other than 8000 Hz and 16000 Hz supported?

    JIT model supports both 8000 and 16000 Hz, ONNX model supports 16000 Hz only. Although other values are not directly supported, multiples of 16000 (e.g. 32000 or 48000) are cast to 16000 inside of the JIT model!

    image

  • How to tune the hyper parameters?

    Though for majority of use cases no tuning is necessary by design, a good start would be to plot probabilities, select the threshold, min_speech_duration_ms, window_size_samples and min_silence_duration_ms. See thus discussion and docstrings for examples.

  • Which sampling rate and chunk size to choose from?

    This should give you some idea. Also please see the docstring for some base values. typically anything higher than 16 kHz is not required for speech. The model most likely will have problems with extremely long chunks.

  • Do models keep state, should chunks be sent sequentially?

    Yes. Though the models were designed for streaming, they can also be used to process long audios. Please see the provided utils, the jit model for example has method model.reset_states().

  • Tensorflow or Tensorflow Lite

    Link.

  • Papers, datasets, training code, etc

    As of this moment, we have not published any of those for lack of time and motivation. For citations and further reading please see links in the README.

Clone this wiki locally