Request for further fine-tuning on Romanian language #336

gabitza-tech · 2023-05-04T10:56:21Z

gabitza-tech
May 4, 2023

🚀 Feature

Further fine-tuning on Romanian datasets / latin languages + more transparency with the architecture of the model and comparisons with other VAD detectors (at this time in 2023). I would greatly appreciate if in future releases, you would take these aspects in consideration.

Motivation

I have integrated the VAD component in a Diarization system. It is a crucial component in order to extract good speaker representations, without noise/silence. Up until now, I have used the vad_multilingual_marblenet model (https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/vad_multilingual_marblenet) and it was good enough. However, I am using a CPU only environment and it is pretty slow. It is also not finetuned on any Romanian dataset. Silero runs 10-15 times faster than Marblenet, but the FA+MISS / DER are much higher..

For my own custom dataset in Romanian:

Diarization using vad_multilingual_marblenet: 0.1245 ( DER ) / 0.099 (FA+MISS)
Diarization using Silero VAD: 0.2745 (DER) / 0.17 (FA+MISS)

Note: i have finetuned the threshold for silero on the dataset, while for nemo I haven't done any finetuning. Is there any reason for the discrepancies in performance? Even though marblenet was not trained on romanian and Silero was?

Pitch

Thanks to its speed, it is a really viable option, especially for commercial purposes. Could you consider fine-tuning, in future releases, on more audio in romanian/other latin languages? Could you provide some information about the quantity of audio for example: romanian, spanish, italian, french, etc?

Alternatives

The capability to fine-tune the model would be amazing.

Answered by snakers4

May 4, 2023

Hi,

Let's separate these numerous questions into buckets:

Further fine-tuning on Romanian datasets / latin languages

Since you are developing a commercial application, you are welcome to DM us in telegram (preferable) or email.

more transparency with the architecture of the model

We provide our VAD with "batteries included", so this is basically out of scope for us.

and comparisons with other VAD detectors (at this time in 2023)

These metrics were updated EOF 2022 - https://github.com/snakers4/silero-vad/wiki/Quality-Metrics.

Naturally we tested only streaming performance:

I have integrated the VAD component in a Diarization system.
It is also not finetuned on any Romanian dataset…

View full answer

snakers4 · 2023-05-04T12:05:18Z

snakers4
May 4, 2023
Maintainer

Hi,

Let's separate these numerous questions into buckets:

Further fine-tuning on Romanian datasets / latin languages

Since you are developing a commercial application, you are welcome to DM us in telegram (preferable) or email.

more transparency with the architecture of the model

We provide our VAD with "batteries included", so this is basically out of scope for us.

and comparisons with other VAD detectors (at this time in 2023)

These metrics were updated EOF 2022 - https://github.com/snakers4/silero-vad/wiki/Quality-Metrics.

Naturally we tested only streaming performance:

I have integrated the VAD component in a Diarization system.
It is also not finetuned on any Romanian dataset. Silero runs 10-15 times faster than Marblenet, but the FA+MISS / DER are much higher..

We have not tested and / or optimized our VAD to be used for diarization.
I can only say that MarbleNet is not an online solution, and probably it is 1-2 orders of magnitude slower that our VAD.
It has a very large window, which may be preferable for diarization purposes, since basically it "cheats" and looks into the "future".

For my own custom dataset in Romanian:
Is there any reason for the discrepancies in performance? Even though marblenet was not trained on romanian and Silero was?

I am not familiar with your dataset and / or diarization metrics, but my guess is that for diarization having longer chunks may be beneficial. We cannot really tell without looking at your dataset and benchmark code.

If you would like us to help you tune the params for optimal performance on your domain (or just check that you are using our VAD correctly), we can discuss it commercially, please DM us in telegram (preferably) or in email.

Could you consider fine-tuning, in future releases, on more audio in romanian/other latin languages? Could you provide some information about the quantity of audio for example: romanian, spanish, italian, french, etc?

Most likely we will not be focusing on these languages for reasons that are out of scope for a technical discussion.
As for dataset composition - I will look it up shortly.

The capability to fine-tune the model would be amazing.

This is planned this year, but the decision to invest time in this depends on reasons out of our control.

2 replies

snakers4 May 4, 2023
Maintainer

The dataset contains Romanian, albeit a very small amount

gabitza-tech May 4, 2023
Author

Thank you for your detailed and fast response! Unfortunately, I cannot disclose the database, as it contains sensitive information, but I greatly appreciate your clarifications! It is indeed a much faster solution and would probably be more suited for online tasks. I hope that by the end of the year it may be possible to fine-tune the model too. I will eagerly wait!

snakers4 · 2024-04-16T07:52:50Z

snakers4
Apr 16, 2024
Maintainer

As a first step - we released the dataset - https://github.com/snakers4/silero-vad/tree/master/datasets

1 reply

gabitza-tech Apr 16, 2024
Author

Hey @snakers4! Thank you for the follow-up!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for further fine-tuning on Romanian language #336

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Request for further fine-tuning on Romanian language #336

gabitza-tech May 4, 2023

🚀 Feature

Motivation

Pitch

Alternatives

Replies: 2 comments · 3 replies

snakers4 May 4, 2023 Maintainer

snakers4 May 4, 2023 Maintainer

gabitza-tech May 4, 2023 Author

snakers4 Apr 16, 2024 Maintainer

gabitza-tech Apr 16, 2024 Author

gabitza-tech
May 4, 2023

Replies: 2 comments 3 replies

snakers4
May 4, 2023
Maintainer

snakers4 May 4, 2023
Maintainer

gabitza-tech May 4, 2023
Author

snakers4
Apr 16, 2024
Maintainer

gabitza-tech Apr 16, 2024
Author