Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Added support for cuBLAS and CLBlast in ggml. #282

Merged
merged 29 commits into from
Jun 18, 2023

Conversation

darxkies
Copy link
Contributor

To enable either of them use --features cublas or --features clblast when running cargo.

It looks like something went wrong yesterday with the latest commit in ggml and ggml-opencl.* is kind of broken. So clblast in llm won't really work. With the older version of ggml, cblast in llm works.

By enabling the features only the prompt/encoding part will be faster.

It should be working with Ubuntu LTS 22.04 too. I tested it only with Arch.

It is a WIP version for #190.

To enable either of them use --features cublas or --features clblast
when running cargo.

It looks like something went wrong yesterday with the latest commit in ggml and
ggml-opencl.* is kind of broken. So clblast in llm won't really work.
With older version of ggml, cblast in llm works.

By enabling the features only the prompt/encoding part will be faster.

It should be working with Ubuntu LTS 22.04 too. I tested it only with
Arch.
@philpax
Copy link
Collaborator

philpax commented May 28, 2023

Awesome! We need a way to find the relevant library files in a cross-platform way, but this is definitely a good start.

@LLukas22
Copy link
Contributor

LLukas22 commented Jun 1, 2023

Is there anything to do here? What are the next steps, to get this working?

@darxkies
Copy link
Contributor Author

darxkies commented Jun 1, 2023

There are at least two main issues that need to be solved:

  • the PR needs to be tested/adapted for Windows/MacOS. So far, it's been tested with Arch only
  • CLBlast does not work. It does not compile. It needs this PR FIX CLBLAST Compile ggml-org/ggml#222 to fix ggml-opencl in ggml.

@zeon256
Copy link

zeon256 commented Jun 7, 2023

Hi, I've tested this on Arch and it works but it only takes up about 300mb of GPU ram, are you planning to add n_gpu_layers to this PR anytime soon?

@darxkies
Copy link
Contributor Author

darxkies commented Jun 7, 2023

The main purpose of this PR is to enable BLAS in ggml so that the llm can build on top of it.

@LLukas22
Copy link
Contributor

LLukas22 commented Jun 7, 2023

@darxkies What exactly is in the /targets/x86_64-linux/include and /targets/x86_64-linux/lib directories? I'm trying to get cublas working on windows and the cuda directory structure is different 😓

@darxkies
Copy link
Contributor Author

darxkies commented Jun 7, 2023

That is where CUDA stores its CUDA Toolkit/SDK files at least in Arch. The two directories are relative to the directory set via CUDA_PATH and in Arch it is set to /opt/cuda. How do you compile it? Do you use WSL or something or directly in Windows?

@LLukas22
Copy link
Contributor

LLukas22 commented Jun 7, 2023

Im trying to build natively on windows, in WSL the build worked as it's just a normal linux distro. For cuBLAS we only have to support Linux and Windows as you cant fit a nvidia gpu into a mac. Because of this i want to make building natively on windows possible. The directory structure is completely different but i think i'm getting there with some documentation and chatGPT.
Another problem i'm facing is that nvcc compiler uses MSVC, which means i have to somehow resolve the cuda directory and the local msbuild directory.

@darxkies
Copy link
Contributor Author

darxkies commented Jun 7, 2023

I installed everything in a VM to compile llm in Windows natively. I'll take a look at it tomorrow regarding cuBLAS and CLBlast.

@LLukas22
Copy link
Contributor

LLukas22 commented Jun 7, 2023

@darxkies Would be great, i hacked something together which maybe is kind of working, but i'm still getting weird compiler errors. I'll start taking a look at how llama.cpp moves layers to the gpu instead.

@LLukas22
Copy link
Contributor

LLukas22 commented Jun 7, 2023

Ok i read the llama.cpp code, and as far as i can guess we need some sort of flag we can set in our ggml crate which we can query at runtime to determine with which backend it was compiled with, as opencl, cuda and metal are all handled differently.

@darxkies
Copy link
Contributor Author

darxkies commented Jun 8, 2023

The Rust features cublas and clblast included in the patch could be used for that. No?

@LLukas22
Copy link
Contributor

LLukas22 commented Jun 8, 2023

I would like to avoid feature flags in llm-base. Maybe we could give the ggml crate a function that returns a backend enum which is dependent in the feature flags it was compiled with.

@darxkies
Copy link
Contributor Author

darxkies commented Jun 8, 2023

Is there a way to pass features down to GGML without having to define them in llm-base? Basically, it is possible to have cuBLAS and CLBlast simultaneously baked in. An Enum might not be enough. Right?

I've updated the PR and added support for Windows and cuBLAS. I will take a look at CLBlast and Windows these days. Please let me know if it works. I checked with GPU-Z and it does show that it uses VRAM when compiled with cuBLAS.

@LLukas22
Copy link
Contributor

LLukas22 commented Jun 9, 2023

Now it compiles on windows with the rust-toolchain and cuda installed, good job 👏Also seams to allocate some VRAM when i try to start an inference session 👍

Are the cuda-specific ggml functions already present in the ggml crate when i compile that way or do i need to implement them manually? @philpax probably something you should know 😓

Is there a way to pass features down to GGML without having to define them in llm-base? Basically, it is possible to have cuBLAS and CLBlast simultaneously baked in. An Enum might not be enough. Right?

I don't think you can have CUBLAS and CLBLAST active at the same time, in my oppinion we should compile ggml only with one acceleration backend at a time, and then vendor the ggml crate with the specific backends. But maybe i'm missing something.

@darxkies
Copy link
Contributor Author

darxkies commented Jun 9, 2023

There is no support for generating ggml-cuda bindings as of now. As I understand it the purpose of generate-ggml-bindings is to generate the bindings for ggml. Manually.

For this to work with the new features, generate-ggml-bindings would have to be extended to always generate the bindings for ggml, ggml-cuda and ggml-opencl to three separate files. In the ggml crate, those three Rust files would then be included in lib.rs based on the selected features.

@LLukas22
Copy link
Contributor

LLukas22 commented Jun 9, 2023

I extended generate-ggml-bindings to generate lib_cuda.rs and lib_opencl.rs which are included into lib.rs dependent on the set feature flag. This should give us access to these functions.

darxkies added 3 commits June 10, 2023 08:56
…s/cuBLAS needs testing. If both features are specified cuBLAS will be selected at compile time.
reworked. Use metal in features list to activate it.
@darxkies
Copy link
Contributor Author

Updated the PR to llama.cpp and so far it works with Arch/cuBLAS/CLBlast & Windows/cuBLAS. Metal support is also included. But it is untested.

To get it working there are a couple of steps necessary.

CLBlast is installed using vcpkg. The link below describes
how to install vcpkg.
https://vcpkg.io/en/getting-started.htm://vcpkg.io/en/getting-started.html

The commands look like this:

git clone https://github.com/Microsoft/vcpkg.git
.\vcpkg\bootstrap-vcpkg.bat
.\vcpkg\vcpkg install clblast
set
OPENCL_PATH=....\vcpkg\packages\opencl_x64-windows
set
CLBLAST_PATH=....\vcpkg\packages\clblast_x64-windows

The environment variables need the full path and are used by llm to
compile.

llm will need clblast.dll & OpenCL.dll to run. They are in the subdirectory bin of the coresponding vcpkg packages.

Compile llm with cblast support.

cargo run --release --features clblast -- mpt infer --model-path mpt-7b-chat-q4_0-ggjt.bin -p "Once upon a time"
@darxkies darxkies mentioned this pull request Jun 10, 2023
@darxkies
Copy link
Contributor Author

Windows/CLBlast works. In the commit are the details on how to use it. It needs more testing.

@LLukas22
Copy link
Contributor

This should be ready, @danforbes could you check the documentation before we summon the big guy for a review?

And we need someone to test clblast on MacOS 😅

@SlyEcho
Copy link

SlyEcho commented Jun 12, 2023

HIBLAS is only in the PR right? It is not in the master branch of llama.cpp as far as I can see.

I am the author of ggerganov/llama.cpp#1087

I can't give you a timeframe when it could be merged, since it's a bit experimental.

However, you may have some success using hipify-perl or something similar, unless you need to have the CMake definitions in the code.

@pixelspark
Copy link
Contributor

And we need someone to test clblast on MacOS 😅

I could try this again later, using the environment variables listed earlier I might be able to fix the issue where it couldn’t find CLblast on my machine.

@pixelspark
Copy link
Contributor

And we need someone to test clblast on MacOS 😅

I could try this again later, using the environment variables listed earlier I might be able to fix the issue where it couldn’t find CLblast on my machine.

$ ls ~/.brew/Cellar/clblast/1.6.0
CHANGELOG		LICENSE			bin			lib
INSTALL_RECEIPT.json	README.md		include			share

$ CLBLAST_PATH=~/.brew/Cellar/clblast/1.6.0 cargo run --release --features clblast mpt infer --model-path ~/Downloads/models/mpt-7b-chat-q4_0-ggjt.bin -p "Once upon a time"

That throws a bunch of warnings and one error:

rror: failed to run custom build command for `ggml-sys v0.2.0-dev (/Users/tommy/Repos/llm/crates/ggml/sys)`

Caused by:
  process didn't exit successfully: `/Users/tommy/Repos/llm/target/release/build/ggml-sys-2d4c323b8b047b52/build-script-build` (exit status: 1)

*snip*

  cargo:warning=llama-cpp/ggml-opencl.cpp:606:25: error: non-aggregate type 'std::atomic_flag' cannot be initialized with an initializer list
  cargo:warning=static std::atomic_flag g_cl_pool_lock = ATOMIC_FLAG_INIT;

@pixelspark pixelspark mentioned this pull request Jun 13, 2023
6 tasks
@darxkies
Copy link
Contributor Author

@LLukas22 It seems that is an issue that needs to be fixed in llama.cpp. Remove MacOS/CLBlast from the compatibility matrix in the documentation and leave the code as it is?

@LLukas22
Copy link
Contributor

@darxkies I think we can set CLBlast for macos out-of-scope for now, realistically macos always want to use metal if possible. If its fixed in llama.cpp later on we could enable support via another PR.

Copy link
Collaborator

@philpax philpax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work! I can't test this at present, but it looks quite comprehensive and I really appreciate the documentation.

Once that comment's added and @LLukas22's happy with everything, I have no problems with him merging it in.

Thanks for the great work! 🚀

.cargo/config Outdated
@@ -0,0 +1,2 @@
[target.x86_64-pc-windows-msvc]
rustflags = ["-Ctarget-feature=+crt-static"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment about why this is necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without that option, the compiler throws warnings when CLBlast is enabled.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, would it be possible to move that to build.rs, have it turn on if and only if it's msvc + clblast, and add a comment? Changing how the CRT is linked can cause global issues, so I'd prefer to reduce the scope of this. (I think you can turn target features on from build.rs)

@philpax philpax added the topic:backend-support Support for alternate non-GGML backends, or for particular GGML backend features label Jun 15, 2023
@darxkies
Copy link
Contributor Author

@LLukas22 In that case, the documentation needs to be updated to remove MacOS/CLBlast as supported.

@LLukas22
Copy link
Contributor

@darxkies I agree with philpax, it would be nice if we could move the flag into build.rs somehow. Then we just update the documentation with a hint that clblast is currently broken on mac-os. Give this a final test and merge it.

@darxkies
Copy link
Contributor Author

I wouldn't add the hint that is broken. Just remove it altogether. At least in the documentation.

Regarding the flag, that was the "recommended" way of solving it. I will take a look at it this weekend. Perhaps overriding RUSTFLAGS in build.rs.

Before merging it, it would make sense to update llama.cpp too.

@LLukas22
Copy link
Contributor

@darxkies Ok the linking stuff doesn't seam to be trivial, the rust book has a short section about it: https://rust-lang.github.io/rfcs/1721-crt-static.html#customizing-linkage-to-the-c-runtime

Regarding the documentation i will just remove the clblast section from mac-os and mark it as unsupported.

@philpax
Copy link
Collaborator

philpax commented Jun 16, 2023

Given that, I'd suggest removing the CRT override entirely and moving it to documentation. I suspect that's a decision that's best made by the integrator / application author.

@LLukas22
Copy link
Contributor

@darxkies I tested my configuration a bit and i cant reproduce the warnings from the compiler when i removed the .cargo file.
I removed it from the PR and added a disclaimer to the documentation, that it may be necessary for certain setups.
What are your thoughts on this?

@darxkies
Copy link
Contributor Author

Very interesting. If you make a cargo clean and then compile with clblast on, the warning should be there. I will take a look at it tomorrow.

@darxkies
Copy link
Contributor Author

I did a git pull, cargo clean, cargo build --release --features clblast and this is the warning I got, that I referred to previously:

warning: cl : Command line warning D9025 : overriding '/MD' with '/MT'

@LLukas22 LLukas22 merged commit 3becd72 into rustformers:main Jun 18, 2023
@hhamud hhamud mentioned this pull request Aug 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
topic:backend-support Support for alternate non-GGML backends, or for particular GGML backend features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants