-
Notifications
You must be signed in to change notification settings - Fork 369
Added support for cuBLAS and CLBlast in ggml. #282
Conversation
To enable either of them use --features cublas or --features clblast when running cargo. It looks like something went wrong yesterday with the latest commit in ggml and ggml-opencl.* is kind of broken. So clblast in llm won't really work. With older version of ggml, cblast in llm works. By enabling the features only the prompt/encoding part will be faster. It should be working with Ubuntu LTS 22.04 too. I tested it only with Arch.
Awesome! We need a way to find the relevant library files in a cross-platform way, but this is definitely a good start. |
Is there anything to do here? What are the next steps, to get this working? |
There are at least two main issues that need to be solved:
|
Hi, I've tested this on Arch and it works but it only takes up about 300mb of GPU ram, are you planning to add n_gpu_layers to this PR anytime soon? |
The main purpose of this PR is to enable BLAS in ggml so that the llm can build on top of it. |
@darxkies What exactly is in the |
That is where CUDA stores its CUDA Toolkit/SDK files at least in Arch. The two directories are relative to the directory set via CUDA_PATH and in Arch it is set to /opt/cuda. How do you compile it? Do you use WSL or something or directly in Windows? |
Im trying to build natively on windows, in WSL the build worked as it's just a normal linux distro. For cuBLAS we only have to support Linux and Windows as you cant fit a nvidia gpu into a mac. Because of this i want to make building natively on windows possible. The directory structure is completely different but i think i'm getting there with some documentation and chatGPT. |
I installed everything in a VM to compile llm in Windows natively. I'll take a look at it tomorrow regarding cuBLAS and CLBlast. |
@darxkies Would be great, i hacked something together which maybe is kind of working, but i'm still getting weird compiler errors. I'll start taking a look at how llama.cpp moves layers to the gpu instead. |
Ok i read the llama.cpp code, and as far as i can guess we need some sort of flag we can set in our ggml crate which we can query at runtime to determine with which backend it was compiled with, as opencl, cuda and metal are all handled differently. |
The Rust features cublas and clblast included in the patch could be used for that. No? |
I would like to avoid feature flags in llm-base. Maybe we could give the ggml crate a function that returns a backend enum which is dependent in the feature flags it was compiled with. |
Is there a way to pass features down to GGML without having to define them in llm-base? Basically, it is possible to have cuBLAS and CLBlast simultaneously baked in. An Enum might not be enough. Right? I've updated the PR and added support for Windows and cuBLAS. I will take a look at CLBlast and Windows these days. Please let me know if it works. I checked with GPU-Z and it does show that it uses VRAM when compiled with cuBLAS. |
Now it compiles on windows with the rust-toolchain and cuda installed, good job 👏Also seams to allocate some VRAM when i try to start an inference session 👍 Are the cuda-specific ggml functions already present in the ggml crate when i compile that way or do i need to implement them manually? @philpax probably something you should know 😓
I don't think you can have CUBLAS and CLBLAST active at the same time, in my oppinion we should compile ggml only with one acceleration backend at a time, and then vendor the ggml crate with the specific backends. But maybe i'm missing something. |
There is no support for generating ggml-cuda bindings as of now. As I understand it the purpose of generate-ggml-bindings is to generate the bindings for ggml. Manually. For this to work with the new features, generate-ggml-bindings would have to be extended to always generate the bindings for ggml, ggml-cuda and ggml-opencl to three separate files. In the ggml crate, those three Rust files would then be included in lib.rs based on the selected features. |
I extended |
…s/cuBLAS needs testing. If both features are specified cuBLAS will be selected at compile time.
reworked. Use metal in features list to activate it.
Updated the PR to llama.cpp and so far it works with Arch/cuBLAS/CLBlast & Windows/cuBLAS. Metal support is also included. But it is untested. |
To get it working there are a couple of steps necessary. CLBlast is installed using vcpkg. The link below describes how to install vcpkg. https://vcpkg.io/en/getting-started.htm://vcpkg.io/en/getting-started.html The commands look like this: git clone https://github.com/Microsoft/vcpkg.git .\vcpkg\bootstrap-vcpkg.bat .\vcpkg\vcpkg install clblast set OPENCL_PATH=....\vcpkg\packages\opencl_x64-windows set CLBLAST_PATH=....\vcpkg\packages\clblast_x64-windows The environment variables need the full path and are used by llm to compile. llm will need clblast.dll & OpenCL.dll to run. They are in the subdirectory bin of the coresponding vcpkg packages. Compile llm with cblast support. cargo run --release --features clblast -- mpt infer --model-path mpt-7b-chat-q4_0-ggjt.bin -p "Once upon a time"
Windows/CLBlast works. In the commit are the details on how to use it. It needs more testing. |
This should be ready, @danforbes could you check the documentation before we summon the big guy for a review? And we need someone to test clblast on MacOS 😅 |
I am the author of ggerganov/llama.cpp#1087 I can't give you a timeframe when it could be merged, since it's a bit experimental. However, you may have some success using |
I could try this again later, using the environment variables listed earlier I might be able to fix the issue where it couldn’t find CLblast on my machine. |
$ ls ~/.brew/Cellar/clblast/1.6.0
CHANGELOG LICENSE bin lib
INSTALL_RECEIPT.json README.md include share
$ CLBLAST_PATH=~/.brew/Cellar/clblast/1.6.0 cargo run --release --features clblast mpt infer --model-path ~/Downloads/models/mpt-7b-chat-q4_0-ggjt.bin -p "Once upon a time" That throws a bunch of warnings and one error:
|
@LLukas22 It seems that is an issue that needs to be fixed in llama.cpp. Remove MacOS/CLBlast from the compatibility matrix in the documentation and leave the code as it is? |
@darxkies I think we can set CLBlast for macos out-of-scope for now, realistically macos always want to use metal if possible. If its fixed in llama.cpp later on we could enable support via another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work! I can't test this at present, but it looks quite comprehensive and I really appreciate the documentation.
Once that comment's added and @LLukas22's happy with everything, I have no problems with him merging it in.
Thanks for the great work! 🚀
.cargo/config
Outdated
@@ -0,0 +1,2 @@ | |||
[target.x86_64-pc-windows-msvc] | |||
rustflags = ["-Ctarget-feature=+crt-static"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment about why this is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without that option, the compiler throws warnings when CLBlast is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, would it be possible to move that to build.rs, have it turn on if and only if it's msvc + clblast, and add a comment? Changing how the CRT is linked can cause global issues, so I'd prefer to reduce the scope of this. (I think you can turn target features on from build.rs)
@LLukas22 In that case, the documentation needs to be updated to remove MacOS/CLBlast as supported. |
@darxkies I agree with philpax, it would be nice if we could move the flag into build.rs somehow. Then we just update the documentation with a hint that clblast is currently broken on mac-os. Give this a final test and merge it. |
I wouldn't add the hint that is broken. Just remove it altogether. At least in the documentation. Regarding the flag, that was the "recommended" way of solving it. I will take a look at it this weekend. Perhaps overriding RUSTFLAGS in build.rs. Before merging it, it would make sense to update llama.cpp too. |
@darxkies Ok the linking stuff doesn't seam to be trivial, the rust book has a short section about it: https://rust-lang.github.io/rfcs/1721-crt-static.html#customizing-linkage-to-the-c-runtime Regarding the documentation i will just remove the clblast section from mac-os and mark it as unsupported. |
Given that, I'd suggest removing the CRT override entirely and moving it to documentation. I suspect that's a decision that's best made by the integrator / application author. |
@darxkies I tested my configuration a bit and i cant reproduce the warnings from the compiler when i removed the .cargo file. |
Very interesting. If you make a cargo clean and then compile with clblast on, the warning should be there. I will take a look at it tomorrow. |
…rence from the documentation.
I did a git pull, cargo clean, cargo build --release --features clblast and this is the warning I got, that I referred to previously: warning: cl : Command line warning D9025 : overriding '/MD' with '/MT' |
To enable either of them use --features cublas or --features clblast when running cargo.
It looks like something went wrong yesterday with the latest commit in ggml and ggml-opencl.* is kind of broken. So clblast in llm won't really work. With the older version of ggml, cblast in llm works.
By enabling the features only the prompt/encoding part will be faster.
It should be working with Ubuntu LTS 22.04 too. I tested it only with Arch.
It is a WIP version for #190.