Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in uploading the voice cloning sample silence_threshold #388

Closed
MrE3gman opened this issue Mar 2, 2025 · 3 comments
Closed

Error in uploading the voice cloning sample silence_threshold #388

MrE3gman opened this issue Mar 2, 2025 · 3 comments

Comments

@MrE3gman
Copy link

MrE3gman commented Mar 2, 2025

Script Mode

  • Docker image

Process Mode

  • Gradio GUI

Operating System:

  • Windows

Describe the bug
When uploading the audio for voice cloning that previously worked (6 sec wav at 24khz) it throws this error on the GUI

Error
extract_voice() error: _trim_and_clean() error: cannot access local variable 'silence_threshold' where it is not associated with a value

Terminal log

2025-03-02 15:00:18 IPs available for connection:
2025-03-02 15:00:18 ['127.0.0.1', '::1', '172.19.0.2']
2025-03-02 15:00:18 Note: 0.0.0.0 is not the IP to connect. Instead use an IP above to connect.
2025-03-02 15:00:18 * Running on local URL: http://0.0.0.0:7860
2025-03-02 15:00:18
2025-03-02 15:00:18 To create a public link, set share=True in launch().
2025-03-02 15:00:54 Input file valid
2025-03-02 15:00:54 ffmpeg version 5.1.6-0+deb12u1 Copyright (c) 2000-2024 the FFmpeg developers
2025-03-02 15:00:54 built with gcc 12 (Debian 12.2.0-14)
2025-03-02 15:00:54 configuration: --prefix=/usr --extra-version=0+deb12u1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared
2025-03-02 15:00:54 libavutil 57. 28.100 / 57. 28.100
2025-03-02 15:00:54 libavcodec 59. 37.100 / 59. 37.100
2025-03-02 15:00:54 libavformat 59. 27.100 / 59. 27.100
2025-03-02 15:00:54 libavdevice 59. 7.100 / 59. 7.100
2025-03-02 15:00:54 libavfilter 8. 44.100 / 8. 44.100
2025-03-02 15:00:54 libswscale 6. 7.100 / 6. 7.100
2025-03-02 15:00:54 libswresample 4. 7.100 / 4. 7.100
2025-03-02 15:00:54 libpostproc 56. 6.100 / 56. 6.100
2025-03-02 15:00:54 Guessed Channel Layout for Input Stream #0.0 : stereo
2025-03-02 15:00:54 Input #0, wav, from '/tmp/gradio/40e6d78261744dbf168bba408d40d0fcc48bd669a2122af4551b505c55e52bae/claudio.wav':
2025-03-02 15:00:54 Duration: 00:00:05.83, bitrate: 768 kb/s
2025-03-02 15:00:54 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, stereo, s16, 768 kb/s
2025-03-02 15:00:54 Stream mapping:
2025-03-02 15:00:54 Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
2025-03-02 15:00:54 Press [q] to stop, [?] for help
2025-03-02 15:00:54 Output #0, wav, to '/home/user/app/voices/__sessions/voice-dc82c937-1c86-4b47-a6d1-107bb14fba0b/claudio.wav':
2025-03-02 15:00:54 Metadata:
2025-03-02 15:00:54 ISFT : Lavf59.27.100
2025-03-02 15:00:54 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s
2025-03-02 15:00:54 Metadata:
2025-03-02 15:00:54 encoder : Lavc59.37.100 pcm_s16le
2025-03-02 15:00:54 size= 4kB time=00:00:00.04 bitrate= 720.5kbits/s speed=N/A
2025-03-02 15:00:54 size= 503kB time=00:00:05.83 bitrate= 705.7kbits/s speed= 557x
2025-03-02 15:00:54 video:0kB audio:503kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.015157%
2025-03-02 15:00:54 Conversion to .wav format for processing successful
2025-03-02 15:00:55 Noise Score: 4225.04
2025-03-02 15:00:55 No background noise or music detected. Skipping separation.
2025-03-02 15:00:55 Traceback (most recent call last):
2025-03-02 15:00:55 File "/home/user/app/lib/classes/voice_extractor.py", line 148, in _trim_and_clean
2025-03-02 15:00:55 self._remove_silences(audio, silence_threshold)
2025-03-02 15:00:55 ^^^^^^^^^^^^^^^^^
2025-03-02 15:00:55 UnboundLocalError: cannot access local variable 'silence_threshold' where it is not associated with a value
2025-03-02 15:00:55
2025-03-02 15:00:55 During handling of the above exception, another exception occurred:
2025-03-02 15:00:55
2025-03-02 15:00:55 Traceback (most recent call last):
2025-03-02 15:00:55 File "/home/user/app/lib/classes/voice_extractor.py", line 292, in extract_voice
2025-03-02 15:00:55 success, msg = self._trim_and_clean()
2025-03-02 15:00:55 ^^^^^^^^^^^^^^^^^^^^^^
2025-03-02 15:00:55 File "/home/user/app/lib/classes/voice_extractor.py", line 210, in _trim_and_clean
2025-03-02 15:00:55 raise ValueError(error)
2025-03-02 15:00:55 ValueError: _trim_and_clean() error: cannot access local variable 'silence_threshold' where it is not associated with a value
2025-03-02 15:00:55
2025-03-02 15:00:55 During handling of the above exception, another exception occurred:
2025-03-02 15:00:55
2025-03-02 15:00:55 Traceback (most recent call last):
2025-03-02 15:00:55 File "/home/user/.local/lib/python3.12/site-packages/gradio/queueing.py", line 625, in process_events
2025-03-02 15:00:55 response = await route_utils.call_process_api(
2025-03-02 15:00:55 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-02 15:00:55 File "/home/user/.local/lib/python3.12/site-packages/gradio/route_utils.py", line 322, in call_process_api
2025-03-02 15:00:55 output = await app.get_blocks().process_api(
2025-03-02 15:00:55 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-02 15:00:55 File "/home/user/.local/lib/python3.12/site-packages/gradio/blocks.py", line 2096, in process_api
2025-03-02 15:00:55 result = await self.call_function(
2025-03-02 15:00:55 ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-02 15:00:55 File "/home/user/.local/lib/python3.12/site-packages/gradio/blocks.py", line 1643, in call_function
2025-03-02 15:00:55 prediction = await anyio.to_thread.run_sync( # type: ignore
2025-03-02 15:00:55 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-02 15:00:55 File "/home/user/.local/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
2025-03-02 15:00:55 return await get_async_backend().run_sync_in_worker_thread(
2025-03-02 15:00:55 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-02 15:00:55 File "/home/user/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2461, in run_sync_in_worker_thread
2025-03-02 15:00:55 return await future
2025-03-02 15:00:55 ^^^^^^^^^^^^
2025-03-02 15:00:55 File "/home/user/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 962, in run
2025-03-02 15:00:55 result = context.run(func, *args)
2025-03-02 15:00:55 ^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-02 15:00:55 File "/home/user/.local/lib/python3.12/site-packages/gradio/utils.py", line 890, in wrapper
2025-03-02 15:00:55 response = f(*args, **kwargs)
2025-03-02 15:00:55 ^^^^^^^^^^^^^^^^^^
2025-03-02 15:00:55 File "/home/user/app/lib/functions.py", line 1887, in change_gr_voice_file
2025-03-02 15:00:55 status, msg = extractor.extract_voice()
2025-03-02 15:00:55 ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-02 15:00:55 File "/home/user/app/lib/classes/voice_extractor.py", line 299, in extract_voice
2025-03-02 15:00:55 raise ValueError(msg)
2025-03-02 15:00:55 ValueError: extract_voice() error: _trim_and_clean() error: cannot access local variable 'silence_threshold' where it is not associated with a value

@ROBERT-MCDOWELL
Copy link
Collaborator

please try the native last git repo, not the docker which is behind the last version for now.

@agentxan
Copy link

agentxan commented Mar 2, 2025

I also get this error using the lastest from git using headless

v25.2.27 native mode
Input file valid
ffmpeg version 7.1-full_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 14.2.0 (Rev1, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libaribb24 --enable-libaribcaption --enable-libdav1d --enable-libdavs2 --enable-libopenjpeg --enable-libquirc --enable-libuavs3d --enable-libxevd --enable-libzvbi --enable-libqrencode --enable-librav1e --enable-libsvtav1 --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxeve --enable-libxvid --enable-libaom --enable-libjxl --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-liblensfun --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-dxva2 --enable-d3d11va --enable-d3d12va --enable-ffnvcodec --enable-libvpl --enable-nvdec --enable-nvenc --enable-vaapi --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libcodec2 --enable-libilbc --enable-libgsm --enable-liblc3 --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
libavutil 59. 39.100 / 59. 39.100
libavcodec 61. 19.100 / 61. 19.100
libavformat 61. 7.100 / 61. 7.100
libavdevice 61. 3.100 / 61. 3.100
libavfilter 10. 4.100 / 10. 4.100
libswscale 8. 3.100 / 8. 3.100
libswresample 5. 3.100 / 5. 3.100
libpostproc 58. 3.100 / 58. 3.100
[aist#0:0/pcm_s16le @ 000002843d82f780] Guessed Channel Layout: stereo
Input #0, wav, from 'D:\ebooktest\voices\jenna.wav':
Duration: 00:00:09.60, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'd:\ebook2audiobook\voices__sessions\voice-798c4557-a1f6-4fc5-88a2-777a7196d97c\jenna.wav':
Metadata:
ISFT : Lavf61.7.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s
Metadata:
encoder : Lavc61.19.100 pcm_s16le
[out#0/wav @ 000002843d82f940] video:0KiB audio:827KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.009212%
size= 827KiB time=00:00:09.60 bitrate= 705.7kbits/s speed= 425x
Conversion to .wav format for processing successful
Noise Score: 5270.12
No background noise or music detected. Skipping separation.
convert_ebook() Exception: extract_voice() error: _trim_and_clean() error: cannot access local variable 'silence_threshold' where it is not associated with a value
Conversion failed: extract_voice() error: _trim_and_clean() error: cannot access local variable 'silence_threshold' where it is not associated with a value

ROBERT-MCDOWELL added a commit to ROBERT-MCDOWELL/ebook2audiobook that referenced this issue Mar 2, 2025
@ROBERT-MCDOWELL
Copy link
Collaborator

fixed and merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants