You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you don't wanna quantize the model:
We have not tried with 24G-memory GPUs, but as a rough estimation, the GPU memory cost for hosting SPHINX on two GPUs should be close to 24G (without quantization). So you may be able to successfully run it without quantization after some optimization, but overall it is really extreme.
24gb gpu is out of memory.
Would you consider releasing a smaller model, one that can run under 24GB?
The text was updated successfully, but these errors were encountered: