-
I am currently exploring the utilization of Whisper.cpp for transcribing Italian recordings. Following successful compilation of whisper.cpp in WSL2 with CUDA support, I found that the provided JFK samples transcribed without any issues. However, when attempting to transcribe my recordings using the largest quantized model available, the results were considerably suboptimal. Attached is a snippet of the audio I am working with, which I converted to a WAV file using FFmpeg, as outlined in the README ( Upon running Whisper with the command
While the transcription accurately captured some segments, it struggled significantly with others, despite their apparent audio similarities. I experimented with adjusting parameters such as beam size and entropy threshold, yet observed minimal, if any, improvement. What do the exclamation points mean? I am seeking insights into potential areas of concern. Could the issue be attributed to the sampling rate of the WAV file? Is there a flaw in my setup? Additionally, I am open to exploring preprocessing steps that could enhance transcription performance on the original audio. I appreciate any assistance or guidance provided. Thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Ran on Arch Linux 6.7.5, hyprland, i7-8565U, nvidia mx130, 8gb ram Hey. I then proceeded to convert it in a .wav file: For the noise reduction i found the ffmpeg's afftdn filter: I then proceeded to run it on my large model. and the output was the following:
so yeah it's definitely better than your previous output, surely with some tuning of afftdn the transcription will be more precise. |
Beta Was this translation helpful? Give feedback.
-
sorry, i've just seen the email for some reason(?) you can find me on
telegram at @zubbyTM
Il giorno mer 13 mar 2024 alle ore 17:38 i4lina ***@***.***>
ha scritto:
… Hey, can I have some contact info to keep this conversation going
somewhere more appropriate? @zubbyy <https://github.com/zubbyy>
—
Reply to this email directly, view it on GitHub
<#1948 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANCC5DAFMJ2C2ILDQBGPNLTYYB6IBAVCNFSM6AAAAABEQB335GVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DONZWGEZTI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
Ran on Arch Linux 6.7.5, hyprland, i7-8565U, nvidia mx130, 8gb ram
Hey.
I'm no expert in this field, but since i wanted to get a little more into whisper.cpp since i'm gonna need it for a future project, i tried to take a look into your issue;
My approach was noise reduction, so i downloaded your mp4.
I then proceeded to convert it in a .wav file:
ffmpeg -i input.mp4 -ar 16000 -ac 1 -c:a pcm_s16le -t 100 output_dirty.wav
For the noise reduction i found the ffmpeg's afftdn filter:
ffmpeg -i output_dirty.wav -af "afftdn=nr=20:nf=-20:tn=1" output.wav
I then proceeded to run it on my large model.
./main -m models/ggml-large-v3.bin -l auto samples/output.wav
and the output was the following: