Some VAD tests #23

AdolfVonKleist · 2021-01-21T12:12:30Z

AdolfVonKleist
Jan 21, 2021

❓ Questions and Help

This looks great, I saw your post on the KAIST VAD repo! I have two questions:

Do you plan to release any more information about the network architecture you used for training, or the training framework itself?
Have you also checked: https://github.com/ina-foss/inaSpeechSegmenter this framework is also excellent and the gender and music classification is fantastic. It is offline however, not online by default.

snakers4 · 2021-01-21T12:35:41Z

snakers4
Jan 21, 2021
Maintainer

Hi,

Do you plan to release any more information about the network architecture you used for training, or the training framework itself?

Well, the network architecture is actually public in the JIT / ONNX files
We are not really planning on publishing any papers on this topic, but our articles in The Gradient may be a good start

We actually position our VAD as a finished solution (not as a toolkit) and there may be some reasons why it cannot be easily separated from our internal processes
Nevertheless it was designed to be as light-weight and portable as possible, with very little external dependencies (PyTorch for JIT and any ONNX backend for ONNX, the data loading pipeline is rudimentary and can be just replaced / deleted / rewritten in your language, i.e. a model is just a compute graph)

Have you also checked: https://github.com/ina-foss/inaSpeechSegmenter this framework is also excellent and the gender and music classification is fantastic. It is offline however, not online by default.
It is offline however, not online by default.
if you wish GPU implementation (recommended)
https://github.com/ina-foss/inaSpeechSegmenter/blob/master/setup.py#L101
install_requires=['numpy', 'pandas', 'keras', 'scikit-image', 'sidekit', 'pyannote.algorithms', 'pyannote.core', 'pyannote.parser', 'matplotlib', 'Pyro4']
$ sudo apt-get install ffmpeg

Many thanks, gave it a quick look just now

Judging by first impressions, am I correct to assume that:

It is kind of slow (GPUs recommended)?
Depends on many components?
Does it work only with French?

Do you know maybe if its VAD can be somehow used on a small chunk of audio, not the whole file?

Other heads of our VAD are be used on whole files as well (number detector, language classifier), but that is not a problem. because first you can apply the VAD in a streaming or non-streaming fashion and then just process the rest of the data in-memory.

and the gender and music classification is fantastic.

Our Enterprise STT pipeline actually classifies gender, but it is actually done via a light-weight speaker encoder.
So theoretically it can be abstracted away into a part of silero-vad.

As for music - it can also be done as a separate head, but in our case there is no clear actual business task that actually needs music / gender / speaker id right now, so I do not know.

0 replies

AdolfVonKleist · 2021-01-21T14:24:50Z

AdolfVonKleist
Jan 21, 2021
Author

Thanks for your reply!

Well, the network architecture is actually public in the JIT / ONNX files

Good point, I haven't had a chance to pull it out and play with it yet, but I will.

We actually position our VAD as a finished solution (not as a toolkit) and there may be some reasons why it cannot be easily separated from our internal processes

Makes sense.

Judging by first impressions, am I correct to assume that:

I'd say it is 'fast enough' OOTB on CPU, and agree there are more components; and also that it is not positioned as a product for sure. The gender tagging for the default OOTB model works well regardless of language.

It is also not suitable OOTB for streaming, but works fine on short audio segments, or combined with webrtc.
But there is a pretty complete description of the framework in the accompanying paper.

Anyway I was just curious if you had also considered it in your comparisons/review.
I'm excited to take silero-vad for a spin as soon as I get a chance.

0 replies

snakers4 · 2021-01-21T15:09:12Z

snakers4
Jan 21, 2021
Maintainer

I'm excited to take silero-vad for a spin as soon as I get a chance.

Which test cases do you have in mind?
Some structured real life testing would not hurt (though we naturally did it ourselves in some form or another).

I have heard some unverified reports that the VAD itself (I believe the public does not yet understand why number detection is useful):

Works well on Ukrainian;
Does not reportedly work on Hindi languages (which is strange);
Works well on Azerbaijani;

0 replies

AdolfVonKleist · 2021-01-26T16:05:59Z

AdolfVonKleist
Jan 26, 2021
Author

We looks at a variety of European languages and largely telephony. I'll try and report back with some test results.
One other algorithm/implementation you might find interesting (super, super lightweight):

0 replies

snakers4 · 2021-01-26T16:18:53Z

snakers4
Jan 26, 2021
Maintainer

https://github.com/pytorch/audio/blob/master/examples/interactive_asr/vad.py

I read the code, afaik this is an energy threshold based algorithm ported from sox into PyTorch
I believe all energy based algorithms should perform +/- the same

0 replies

snakers4 · 2021-01-26T16:20:33Z

snakers4
Jan 26, 2021
Maintainer

Ah, I thought you referred to torch audio vad
This is one different
Many thanks

0 replies

snakers4 · 2021-01-27T08:06:22Z

snakers4
Jan 27, 2021
Maintainer

In the end we decided just to apply our validation scheme to the vad above that you sent and to the vad from torch-audio
We shall see, I will post an update here as well

0 replies

snakers4 · 2021-01-28T08:10:19Z

snakers4
Jan 28, 2021
Maintainer

https://github.com/pytorch/audio/blob/master/examples/interactive_asr/vad.py

we tried this, with the following results

note though that our test is extremely "hard"

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some VAD tests #23

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 8 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Some VAD tests #23

AdolfVonKleist Jan 21, 2021

❓ Questions and Help

Replies: 8 comments

snakers4 Jan 21, 2021 Maintainer

AdolfVonKleist Jan 21, 2021 Author

snakers4 Jan 21, 2021 Maintainer

AdolfVonKleist Jan 26, 2021 Author

snakers4 Jan 26, 2021 Maintainer

snakers4 Jan 26, 2021 Maintainer

snakers4 Jan 27, 2021 Maintainer

snakers4 Jan 28, 2021 Maintainer

AdolfVonKleist
Jan 21, 2021

snakers4
Jan 21, 2021
Maintainer

AdolfVonKleist
Jan 21, 2021
Author

snakers4
Jan 21, 2021
Maintainer

AdolfVonKleist
Jan 26, 2021
Author

snakers4
Jan 26, 2021
Maintainer

snakers4
Jan 26, 2021
Maintainer

snakers4
Jan 27, 2021
Maintainer

snakers4
Jan 28, 2021
Maintainer