Some VAD tests #23
Replies: 8 comments
-
Hi,
Well, the network architecture is actually public in the JIT / ONNX files We actually position our VAD as a finished solution (not as a toolkit) and there may be some reasons why it cannot be easily separated from our internal processes
Many thanks, gave it a quick look just now Judging by first impressions, am I correct to assume that:
Do you know maybe if its VAD can be somehow used on a small chunk of audio, not the whole file? Other heads of our VAD are be used on whole files as well (number detector, language classifier), but that is not a problem. because first you can apply the VAD in a streaming or non-streaming fashion and then just process the rest of the data in-memory.
Our Enterprise STT pipeline actually classifies gender, but it is actually done via a light-weight speaker encoder. As for music - it can also be done as a separate head, but in our case there is no clear actual business task that actually needs music / gender / speaker id right now, so I do not know. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your reply!
Good point, I haven't had a chance to pull it out and play with it yet, but I will.
Makes sense.
I'd say it is 'fast enough' OOTB on CPU, and agree there are more components; and also that it is not positioned as a product for sure. The gender tagging for the default OOTB model works well regardless of language. It is also not suitable OOTB for streaming, but works fine on short audio segments, or combined with webrtc. Anyway I was just curious if you had also considered it in your comparisons/review. |
Beta Was this translation helpful? Give feedback.
-
Which test cases do you have in mind? I have heard some unverified reports that the VAD itself (I believe the public does not yet understand why number detection is useful):
|
Beta Was this translation helpful? Give feedback.
-
We looks at a variety of European languages and largely telephony. I'll try and report back with some test results. |
Beta Was this translation helpful? Give feedback.
-
I read the code, afaik this is an energy threshold based algorithm ported from sox into PyTorch |
Beta Was this translation helpful? Give feedback.
-
Ah, I thought you referred to torch audio vad |
Beta Was this translation helpful? Give feedback.
-
In the end we decided just to apply our validation scheme to the vad above that you sent and to the vad from |
Beta Was this translation helpful? Give feedback.
-
we tried this, with the following results note though that our test is extremely "hard" |
Beta Was this translation helpful? Give feedback.
-
❓ Questions and Help
This looks great, I saw your post on the KAIST VAD repo! I have two questions:
Beta Was this translation helpful? Give feedback.
All reactions