-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue for SPHINX quantization & other memory issues #114
Comments
When I used 4*16GB VRAM GPUS to run the scripts
Looking forward to your reply! |
With 16GB mem per GPU SPHINX will need to run on all 4 GPUs (26GB for LM params, 6GB for visual params, 4GB for kv-cache, 3GB for SAM, adding up to >32GB). We plan to add this support in the next few days (currently, we only support running on 1 or 2 GPUs). |
#116 fixes inference memory with image input. We have now moved the inference development to small (24/16GB) GPUs to avoid such errors slipping by on the large training GPUs. |
Ongoing developments as of 27 Nov: FP16 inference memory optimizations
If you have other feature requests about SPHINX inference, please feel free to reply under this issue. |
We have currently received several requests (#112, #110, #97) to run the SPHINX inference on GPUs with smaller memory. We also believe that fitting it into the 24GB memory bar benefits a broad range of users who would like to run the model locally on commodity GPUs like 3090 or 4090.
With the latest update #113, we should see NF4 quantization running fine on SPHINX without errors (i.e., resolving #97). The memory usage is a bit less than 23GB, and it should now fit into a single 24GB GPU (3090, 4090 or A5000) even with ECC turned on
We are still doing a complete benchmark of this quantized model and will update the latest information under this issue. Meanwhile, any question is also welcomed :)
The text was updated successfully, but these errors were encountered: