We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to generate voice using "OpenAI API"
Create the PIP environment, and run auralis.openai --host 0.0.0.0 --port 8000 --model AstraMindAI/xttsv2 --gpt_model AstraMindAI/xtts2-gpt --max_concurrency 8 --vllm_logging_level warn
auralis.openai --host 0.0.0.0 --port 8000 --model AstraMindAI/xttsv2 --gpt_model AstraMindAI/xtts2-gpt --max_concurrency 8 --vllm_logging_level warn
Then try to generate audio through the http://192.168.0.14:8000/v1/audio/speech endpoint with
http://192.168.0.14:8000/v1/audio/speech
{ "input": "this is a test", "model": "xttsv2", "voice": [ "/examples/ncage.wav" ], "response_format": "wav", "speed": 0, "enhance_speech": false, "language": "auto", "max_ref_length": 60, "gpt_cond_len": 30, "gpt_cond_chunk_len": 4, "temperature": 0.75, "top_p": 0.85, "top_k": 50, "repetition_penalty": 5, "length_penalty": 1, "do_sample": true }
curl -X 'POST' \ 'http://192.168.0.14:8000/v1/audio/speech' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "input": "this is a test", "model": "xttsv2", "voice": [ "/examples/ncage.wav" ], "response_format": "wav", "speed": 0, "enhance_speech": false, "language": "auto", "max_ref_length": 60, "gpt_cond_len": 30, "gpt_cond_chunk_len": 4, "temperature": 0.75, "top_p": 0.85, "top_k": 50, "repetition_penalty": 5, "length_penalty": 1, "do_sample": true }'
The voice is present in the /examples folder.
The voice is generated
I receive:
{ "detail": [ { "type": "value_error", "loc": [ "body", "voice" ], "msg": "Value error, Invalid base64 encoding in voice file", "input": [ "/examples/ncage.wav" ], "ctx": { "error": {} } } ] }
INFO: 192.168.0.69:63960 - "POST /v1/audio/speech HTTP/1.1" 422 Unprocessable Entity
Please run the following commands and include the output:
# OS Information `Linux machinelearning 6.8.0-52-generic #53-Ubuntu SMP PREEMPT_DYNAMIC Sat Jan 11 00:06:25 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux` # Python version `Python 3.12.3` # Installed Python packages Package Version --------------------------------- ------------- aiofiles 24.1.0 aiohappyeyeballs 2.4.4 aiohttp 3.11.11 aiosignal 1.3.2 annotated-types 0.7.0 anyio 4.8.0 asttokens 3.0.0 attrs 25.1.0 audioread 3.0.1 auralis 0.2.8.post2 beautifulsoup4 4.13.1 blis 0.7.11 cachetools 5.5.1 catalogue 2.0.10 certifi 2025.1.31 cffi 1.17.1 charset-normalizer 3.4.1 click 8.1.8 cloudpathlib 0.20.0 cloudpickle 3.1.1 colorama 0.4.6 compressed-tensors 0.8.0 confection 0.1.5 cutlet 0.5.0 cymem 2.0.11 datasets 3.2.0 decorator 5.1.1 dill 0.3.8 diskcache 5.6.3 distro 1.9.0 docopt 0.6.2 EbookLib 0.18 einops 0.8.0 executing 2.2.0 fastapi 0.115.8 ffmpeg 1.4 filelock 3.17.0 frozenlist 1.5.0 fsspec 2024.9.0 fugashi 1.4.0 future 1.0.0 gguf 0.10.0 h11 0.14.0 hangul-romanize 0.1.0 httpcore 1.0.7 httptools 0.6.4 httpx 0.28.1 huggingface-hub 0.28.1 idna 3.10 importlib_metadata 8.6.1 iniconfig 2.0.0 interegular 0.3.3 ipython 8.32.0 jaconv 0.4.0 jedi 0.19.2 Jinja2 3.1.5 jiter 0.8.2 joblib 1.4.2 jsonschema 4.23.0 jsonschema-specifications 2024.10.1 langcodes 3.5.0 langid 1.1.6 language_data 1.3.0 lark 1.2.2 lazy_loader 0.4 librosa 0.10.2.post1 llvmlite 0.44.0 lm-format-enforcer 0.10.9 lxml 5.3.0 marisa-trie 1.2.1 markdown-it-py 3.0.0 MarkupSafe 3.0.2 matplotlib-inline 0.1.7 mdurl 0.1.2 mistral_common 1.5.2 mojimoji 0.0.13 mpmath 1.3.0 msgpack 1.1.0 msgspec 0.19.0 multidict 6.1.0 multiprocess 0.70.16 murmurhash 1.0.12 nest-asyncio 1.6.0 networkx 3.4.2 num2words 0.5.14 numba 0.61.0 numpy 1.26.4 nvidia-cublas-cu12 12.4.5.8 nvidia-cuda-cupti-cu12 12.4.127 nvidia-cuda-nvrtc-cu12 12.4.127 nvidia-cuda-runtime-cu12 12.4.127 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.2.1.3 nvidia-curand-cu12 10.3.5.147 nvidia-cusolver-cu12 11.6.1.9 nvidia-cusparse-cu12 12.3.1.170 nvidia-ml-py 12.570.86 nvidia-nccl-cu12 2.21.5 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.4.127 openai 1.61.0 OpenCC 1.1.9 opencv-python-headless 4.11.0.86 outlines 0.0.46 packaging 24.2 pandas 2.2.3 parso 0.8.4 partial-json-parser 0.2.1.1.post5 pexpect 4.9.0 pillow 10.4.0 pip 24.0 platformdirs 4.3.6 pluggy 1.5.0 pooch 1.8.2 preshed 3.0.9 prometheus_client 0.21.1 prometheus-fastapi-instrumentator 7.0.2 prompt_toolkit 3.0.50 propcache 0.2.1 protobuf 5.29.3 psutil 6.1.1 ptyprocess 0.7.0 pure_eval 0.2.3 py-cpuinfo 9.0.0 pyairports 2.1.1 pyarrow 19.0.0 pycountry 24.6.1 pycparser 2.22 pydantic 2.10.6 pydantic_core 2.27.2 Pygments 2.19.1 pyloudnorm 0.1.1 pypinyin 0.53.0 pytest 8.3.4 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 pytz 2025.1 PyYAML 6.0.2 pyzmq 26.2.1 ray 2.42.0 referencing 0.36.2 regex 2024.11.6 requests 2.32.3 rich 13.9.4 rpds-py 0.22.3 safetensors 0.5.2 scikit-learn 1.6.1 scipy 1.15.1 sentencepiece 0.2.0 setuptools 75.8.0 shellingham 1.5.4 six 1.17.0 smart-open 7.1.0 sniffio 1.3.1 sounddevice 0.5.1 soundfile 0.13.1 soupsieve 2.6 soxr 0.5.0.post1 spacy 3.7.5 spacy-legacy 3.0.12 spacy-loggers 1.0.5 srsly 2.5.1 stack-data 0.6.3 starlette 0.45.3 sympy 1.13.1 thinc 8.2.5 threadpoolctl 3.5.0 tiktoken 0.7.0 tokenizers 0.21.0 torch 2.5.1 torchaudio 2.5.1 torchvision 0.20.1 tqdm 4.67.1 traitlets 5.14.3 transformers 4.48.2 triton 3.1.0 typer 0.15.1 typing_extensions 4.12.2 tzdata 2025.1 urllib3 2.3.0 uvicorn 0.34.0 uvloop 0.21.0 vllm 0.6.4.post1 wasabi 1.1.3 watchfiles 1.0.4 wcwidth 0.2.13 weasel 0.4.1 websockets 14.2 wrapt 1.17.2 xformers 0.0.28.post3 xxhash 3.5.0 yarl 1.18.3 zipp 3.21.0
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 565.77 Driver Version: 565.77 CUDA Version: 12.7 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A | | 0% 38C P8 17W / 370W | 19834MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 57527 C python3 3410MiB | | 0 N/A N/A 1602657 C ...unners/cuda_v12/ollama_llama_server 16408MiB | +-----------------------------------------------------------------------------------------+
-bash: nvcc: command not found
## Possible Solutions I have no clue, but I am willing to help!
The end goal is to use this GitHub project alongside https://github.com/sfortis/openai_tts
https://github.com/sfortis/openai_tts
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Bug Description
Unable to generate voice using "OpenAI API"
Minimal Reproducible Example
Create the PIP environment, and run
auralis.openai --host 0.0.0.0 --port 8000 --model AstraMindAI/xttsv2 --gpt_model AstraMindAI/xtts2-gpt --max_concurrency 8 --vllm_logging_level warn
Then try to generate audio through the
http://192.168.0.14:8000/v1/audio/speech
endpoint withThe voice is present in the /examples folder.
Expected Behavior
The voice is generated
Actual Behavior
I receive:
Error Logs
Environment
Please run the following commands and include the output:
GPU Information (if applicable)
CUDA version (if applicable)
-bash: nvcc: command not found
Additional Information
The end goal is to use this GitHub project alongside
https://github.com/sfortis/openai_tts
The text was updated successfully, but these errors were encountered: