Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG🐛] Unable to generate voices using OpenAI api #64

Open
maxi1134 opened this issue Feb 4, 2025 · 0 comments
Open

[BUG🐛] Unable to generate voices using OpenAI api #64

maxi1134 opened this issue Feb 4, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@maxi1134
Copy link

maxi1134 commented Feb 4, 2025

Bug Description

Unable to generate voice using "OpenAI API"

Minimal Reproducible Example

Create the PIP environment, and run auralis.openai --host 0.0.0.0 --port 8000 --model AstraMindAI/xttsv2 --gpt_model AstraMindAI/xtts2-gpt --max_concurrency 8 --vllm_logging_level warn

Then try to generate audio through the http://192.168.0.14:8000/v1/audio/speech endpoint with

{
  "input": "this is a test",
  "model": "xttsv2",
  "voice": [
    "/examples/ncage.wav"
  ],
  "response_format": "wav",
  "speed": 0,
  "enhance_speech": false,
  "language": "auto",
  "max_ref_length": 60,
  "gpt_cond_len": 30,
  "gpt_cond_chunk_len": 4,
  "temperature": 0.75,
  "top_p": 0.85,
  "top_k": 50,
  "repetition_penalty": 5,
  "length_penalty": 1,
  "do_sample": true
}
curl -X 'POST' \
  'http://192.168.0.14:8000/v1/audio/speech' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "input": "this is a test",
  "model": "xttsv2",
  "voice": [
    "/examples/ncage.wav"
  ],
  "response_format": "wav",
  "speed": 0,
  "enhance_speech": false,
  "language": "auto",
  "max_ref_length": 60,
  "gpt_cond_len": 30,
  "gpt_cond_chunk_len": 4,
  "temperature": 0.75,
  "top_p": 0.85,
  "top_k": 50,
  "repetition_penalty": 5,
  "length_penalty": 1,
  "do_sample": true
}'

The voice is present in the /examples folder.

Expected Behavior

The voice is generated

Actual Behavior

I receive:

{
  "detail": [
    {
      "type": "value_error",
      "loc": [
        "body",
        "voice"
      ],
      "msg": "Value error, Invalid base64 encoding in voice file",
      "input": [
        "/examples/ncage.wav"
      ],
      "ctx": {
        "error": {}
      }
    }
  ]
}

Error Logs

INFO:     192.168.0.69:63960 - "POST /v1/audio/speech HTTP/1.1" 422 Unprocessable Entity

Environment

Please run the following commands and include the output:

# OS Information
`Linux machinelearning 6.8.0-52-generic #53-Ubuntu SMP PREEMPT_DYNAMIC Sat Jan 11 00:06:25 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux`


# Python version
`Python 3.12.3`

# Installed Python packages


Package                           Version
--------------------------------- -------------
aiofiles                          24.1.0
aiohappyeyeballs                  2.4.4
aiohttp                           3.11.11
aiosignal                         1.3.2
annotated-types                   0.7.0
anyio                             4.8.0
asttokens                         3.0.0
attrs                             25.1.0
audioread                         3.0.1
auralis                           0.2.8.post2
beautifulsoup4                    4.13.1
blis                              0.7.11
cachetools                        5.5.1
catalogue                         2.0.10
certifi                           2025.1.31
cffi                              1.17.1
charset-normalizer                3.4.1
click                             8.1.8
cloudpathlib                      0.20.0
cloudpickle                       3.1.1
colorama                          0.4.6
compressed-tensors                0.8.0
confection                        0.1.5
cutlet                            0.5.0
cymem                             2.0.11
datasets                          3.2.0
decorator                         5.1.1
dill                              0.3.8
diskcache                         5.6.3
distro                            1.9.0
docopt                            0.6.2
EbookLib                          0.18
einops                            0.8.0
executing                         2.2.0
fastapi                           0.115.8
ffmpeg                            1.4
filelock                          3.17.0
frozenlist                        1.5.0
fsspec                            2024.9.0
fugashi                           1.4.0
future                            1.0.0
gguf                              0.10.0
h11                               0.14.0
hangul-romanize                   0.1.0
httpcore                          1.0.7
httptools                         0.6.4
httpx                             0.28.1
huggingface-hub                   0.28.1
idna                              3.10
importlib_metadata                8.6.1
iniconfig                         2.0.0
interegular                       0.3.3
ipython                           8.32.0
jaconv                            0.4.0
jedi                              0.19.2
Jinja2                            3.1.5
jiter                             0.8.2
joblib                            1.4.2
jsonschema                        4.23.0
jsonschema-specifications         2024.10.1
langcodes                         3.5.0
langid                            1.1.6
language_data                     1.3.0
lark                              1.2.2
lazy_loader                       0.4
librosa                           0.10.2.post1
llvmlite                          0.44.0
lm-format-enforcer                0.10.9
lxml                              5.3.0
marisa-trie                       1.2.1
markdown-it-py                    3.0.0
MarkupSafe                        3.0.2
matplotlib-inline                 0.1.7
mdurl                             0.1.2
mistral_common                    1.5.2
mojimoji                          0.0.13
mpmath                            1.3.0
msgpack                           1.1.0
msgspec                           0.19.0
multidict                         6.1.0
multiprocess                      0.70.16
murmurhash                        1.0.12
nest-asyncio                      1.6.0
networkx                          3.4.2
num2words                         0.5.14
numba                             0.61.0
numpy                             1.26.4
nvidia-cublas-cu12                12.4.5.8
nvidia-cuda-cupti-cu12            12.4.127
nvidia-cuda-nvrtc-cu12            12.4.127
nvidia-cuda-runtime-cu12          12.4.127
nvidia-cudnn-cu12                 9.1.0.70
nvidia-cufft-cu12                 11.2.1.3
nvidia-curand-cu12                10.3.5.147
nvidia-cusolver-cu12              11.6.1.9
nvidia-cusparse-cu12              12.3.1.170
nvidia-ml-py                      12.570.86
nvidia-nccl-cu12                  2.21.5
nvidia-nvjitlink-cu12             12.4.127
nvidia-nvtx-cu12                  12.4.127
openai                            1.61.0
OpenCC                            1.1.9
opencv-python-headless            4.11.0.86
outlines                          0.0.46
packaging                         24.2
pandas                            2.2.3
parso                             0.8.4
partial-json-parser               0.2.1.1.post5
pexpect                           4.9.0
pillow                            10.4.0
pip                               24.0
platformdirs                      4.3.6
pluggy                            1.5.0
pooch                             1.8.2
preshed                           3.0.9
prometheus_client                 0.21.1
prometheus-fastapi-instrumentator 7.0.2
prompt_toolkit                    3.0.50
propcache                         0.2.1
protobuf                          5.29.3
psutil                            6.1.1
ptyprocess                        0.7.0
pure_eval                         0.2.3
py-cpuinfo                        9.0.0
pyairports                        2.1.1
pyarrow                           19.0.0
pycountry                         24.6.1
pycparser                         2.22
pydantic                          2.10.6
pydantic_core                     2.27.2
Pygments                          2.19.1
pyloudnorm                        0.1.1
pypinyin                          0.53.0
pytest                            8.3.4
python-dateutil                   2.9.0.post0
python-dotenv                     1.0.1
pytz                              2025.1
PyYAML                            6.0.2
pyzmq                             26.2.1
ray                               2.42.0
referencing                       0.36.2
regex                             2024.11.6
requests                          2.32.3
rich                              13.9.4
rpds-py                           0.22.3
safetensors                       0.5.2
scikit-learn                      1.6.1
scipy                             1.15.1
sentencepiece                     0.2.0
setuptools                        75.8.0
shellingham                       1.5.4
six                               1.17.0
smart-open                        7.1.0
sniffio                           1.3.1
sounddevice                       0.5.1
soundfile                         0.13.1
soupsieve                         2.6
soxr                              0.5.0.post1
spacy                             3.7.5
spacy-legacy                      3.0.12
spacy-loggers                     1.0.5
srsly                             2.5.1
stack-data                        0.6.3
starlette                         0.45.3
sympy                             1.13.1
thinc                             8.2.5
threadpoolctl                     3.5.0
tiktoken                          0.7.0
tokenizers                        0.21.0
torch                             2.5.1
torchaudio                        2.5.1
torchvision                       0.20.1
tqdm                              4.67.1
traitlets                         5.14.3
transformers                      4.48.2
triton                            3.1.0
typer                             0.15.1
typing_extensions                 4.12.2
tzdata                            2025.1
urllib3                           2.3.0
uvicorn                           0.34.0
uvloop                            0.21.0
vllm                              0.6.4.post1
wasabi                            1.1.3
watchfiles                        1.0.4
wcwidth                           0.2.13
weasel                            0.4.1
websockets                        14.2
wrapt                             1.17.2
xformers                          0.0.28.post3
xxhash                            3.5.0
yarl                              1.18.3
zipp                              3.21.0

GPU Information (if applicable)


+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.77                 Driver Version: 565.77         CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:01:00.0 Off |                  N/A |
|  0%   38C    P8             17W /  370W |   19834MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     57527      C   python3                                      3410MiB |
|    0   N/A  N/A   1602657      C   ...unners/cuda_v12/ollama_llama_server      16408MiB |
+-----------------------------------------------------------------------------------------+

CUDA version (if applicable)

-bash: nvcc: command not found


## Possible Solutions
I have no clue, but I am willing to help!

Additional Information

The end goal is to use this GitHub project alongside https://github.com/sfortis/openai_tts

@maxi1134 maxi1134 added the bug Something isn't working label Feb 4, 2025
@maxi1134 maxi1134 changed the title [BUG🐛] [BUG🐛] Unable to generate voices using OpenAI api Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant