Added support for cuBLAS and CLBlast in ggml. #282

darxkies · 2023-05-28T09:38:07Z

To enable either of them use --features cublas or --features clblast when running cargo.

It looks like something went wrong yesterday with the latest commit in ggml and ggml-opencl.* is kind of broken. So clblast in llm won't really work. With the older version of ggml, cblast in llm works.

By enabling the features only the prompt/encoding part will be faster.

It should be working with Ubuntu LTS 22.04 too. I tested it only with Arch.

It is a WIP version for #190.

To enable either of them use --features cublas or --features clblast when running cargo. It looks like something went wrong yesterday with the latest commit in ggml and ggml-opencl.* is kind of broken. So clblast in llm won't really work. With older version of ggml, cblast in llm works. By enabling the features only the prompt/encoding part will be faster. It should be working with Ubuntu LTS 22.04 too. I tested it only with Arch.

philpax · 2023-05-28T10:02:29Z

Awesome! We need a way to find the relevant library files in a cross-platform way, but this is definitely a good start.

LLukas22 · 2023-06-01T12:24:49Z

Is there anything to do here? What are the next steps, to get this working?

darxkies · 2023-06-01T18:31:25Z

There are at least two main issues that need to be solved:

the PR needs to be tested/adapted for Windows/MacOS. So far, it's been tested with Arch only
CLBlast does not work. It does not compile. It needs this PR FIX CLBLAST Compile ggml-org/ggml#222 to fix ggml-opencl in ggml.

zeon256 · 2023-06-07T11:46:08Z

Hi, I've tested this on Arch and it works but it only takes up about 300mb of GPU ram, are you planning to add n_gpu_layers to this PR anytime soon?

darxkies · 2023-06-07T15:00:10Z

The main purpose of this PR is to enable BLAS in ggml so that the llm can build on top of it.

LLukas22 · 2023-06-07T15:35:13Z

@darxkies What exactly is in the /targets/x86_64-linux/include and /targets/x86_64-linux/lib directories? I'm trying to get cublas working on windows and the cuda directory structure is different 😓

darxkies · 2023-06-07T15:56:44Z

That is where CUDA stores its CUDA Toolkit/SDK files at least in Arch. The two directories are relative to the directory set via CUDA_PATH and in Arch it is set to /opt/cuda. How do you compile it? Do you use WSL or something or directly in Windows?

LLukas22 · 2023-06-07T16:16:02Z

Im trying to build natively on windows, in WSL the build worked as it's just a normal linux distro. For cuBLAS we only have to support Linux and Windows as you cant fit a nvidia gpu into a mac. Because of this i want to make building natively on windows possible. The directory structure is completely different but i think i'm getting there with some documentation and chatGPT.
Another problem i'm facing is that nvcc compiler uses MSVC, which means i have to somehow resolve the cuda directory and the local msbuild directory.

darxkies · 2023-06-07T18:01:36Z

I installed everything in a VM to compile llm in Windows natively. I'll take a look at it tomorrow regarding cuBLAS and CLBlast.

LLukas22 · 2023-06-07T18:24:08Z

@darxkies Would be great, i hacked something together which maybe is kind of working, but i'm still getting weird compiler errors. I'll start taking a look at how llama.cpp moves layers to the gpu instead.

LLukas22 · 2023-06-07T19:43:13Z

Ok i read the llama.cpp code, and as far as i can guess we need some sort of flag we can set in our ggml crate which we can query at runtime to determine with which backend it was compiled with, as opencl, cuda and metal are all handled differently.

darxkies · 2023-06-08T07:21:45Z

The Rust features cublas and clblast included in the patch could be used for that. No?

LLukas22 · 2023-06-08T08:00:53Z

I would like to avoid feature flags in llm-base. Maybe we could give the ggml crate a function that returns a backend enum which is dependent in the feature flags it was compiled with.

darxkies · 2023-06-08T16:45:17Z

Is there a way to pass features down to GGML without having to define them in llm-base? Basically, it is possible to have cuBLAS and CLBlast simultaneously baked in. An Enum might not be enough. Right?

I've updated the PR and added support for Windows and cuBLAS. I will take a look at CLBlast and Windows these days. Please let me know if it works. I checked with GPU-Z and it does show that it uses VRAM when compiled with cuBLAS.

LLukas22 · 2023-06-09T07:13:15Z

Now it compiles on windows with the rust-toolchain and cuda installed, good job 👏Also seams to allocate some VRAM when i try to start an inference session 👍

Are the cuda-specific ggml functions already present in the ggml crate when i compile that way or do i need to implement them manually? @philpax probably something you should know 😓

Is there a way to pass features down to GGML without having to define them in llm-base? Basically, it is possible to have cuBLAS and CLBlast simultaneously baked in. An Enum might not be enough. Right?

I don't think you can have CUBLAS and CLBLAST active at the same time, in my oppinion we should compile ggml only with one acceleration backend at a time, and then vendor the ggml crate with the specific backends. But maybe i'm missing something.

darxkies · 2023-06-09T08:52:32Z

There is no support for generating ggml-cuda bindings as of now. As I understand it the purpose of generate-ggml-bindings is to generate the bindings for ggml. Manually.

For this to work with the new features, generate-ggml-bindings would have to be extended to always generate the bindings for ggml, ggml-cuda and ggml-opencl to three separate files. In the ggml crate, those three Rust files would then be included in lib.rs based on the selected features.

LLukas22 · 2023-06-09T12:17:06Z

I extended generate-ggml-bindings to generate lib_cuda.rs and lib_opencl.rs which are included into lib.rs dependent on the set feature flag. This should give us access to these functions.

…s/cuBLAS needs testing. If both features are specified cuBLAS will be selected at compile time.

reworked. Use metal in features list to activate it.

darxkies · 2023-06-10T10:25:07Z

Updated the PR to llama.cpp and so far it works with Arch/cuBLAS/CLBlast & Windows/cuBLAS. Metal support is also included. But it is untested.

To get it working there are a couple of steps necessary. CLBlast is installed using vcpkg. The link below describes how to install vcpkg. https://vcpkg.io/en/getting-started.htm://vcpkg.io/en/getting-started.html The commands look like this: git clone https://github.com/Microsoft/vcpkg.git .\vcpkg\bootstrap-vcpkg.bat .\vcpkg\vcpkg install clblast set OPENCL_PATH=....\vcpkg\packages\opencl_x64-windows set CLBLAST_PATH=....\vcpkg\packages\clblast_x64-windows The environment variables need the full path and are used by llm to compile. llm will need clblast.dll & OpenCL.dll to run. They are in the subdirectory bin of the coresponding vcpkg packages. Compile llm with cblast support. cargo run --release --features clblast -- mpt infer --model-path mpt-7b-chat-q4_0-ggjt.bin -p "Once upon a time"

darxkies · 2023-06-10T15:44:32Z

Windows/CLBlast works. In the commit are the details on how to use it. It needs more testing.

LLukas22 · 2023-06-12T12:41:15Z

This should be ready, @danforbes could you check the documentation before we summon the big guy for a review?

And we need someone to test clblast on MacOS 😅

SlyEcho · 2023-06-12T17:15:59Z

HIBLAS is only in the PR right? It is not in the master branch of llama.cpp as far as I can see.

I am the author of ggerganov/llama.cpp#1087

I can't give you a timeframe when it could be merged, since it's a bit experimental.

However, you may have some success using hipify-perl or something similar, unless you need to have the CMake definitions in the code.

pixelspark · 2023-06-13T06:37:33Z

And we need someone to test clblast on MacOS 😅

I could try this again later, using the environment variables listed earlier I might be able to fix the issue where it couldn’t find CLblast on my machine.

pixelspark · 2023-06-13T19:29:27Z

And we need someone to test clblast on MacOS 😅

I could try this again later, using the environment variables listed earlier I might be able to fix the issue where it couldn’t find CLblast on my machine.

$ ls ~/.brew/Cellar/clblast/1.6.0
CHANGELOG		LICENSE			bin			lib
INSTALL_RECEIPT.json	README.md		include			share

$ CLBLAST_PATH=~/.brew/Cellar/clblast/1.6.0 cargo run --release --features clblast mpt infer --model-path ~/Downloads/models/mpt-7b-chat-q4_0-ggjt.bin -p "Once upon a time"

That throws a bunch of warnings and one error:

rror: failed to run custom build command for `ggml-sys v0.2.0-dev (/Users/tommy/Repos/llm/crates/ggml/sys)`

Caused by:
  process didn't exit successfully: `/Users/tommy/Repos/llm/target/release/build/ggml-sys-2d4c323b8b047b52/build-script-build` (exit status: 1)

*snip*

  cargo:warning=llama-cpp/ggml-opencl.cpp:606:25: error: non-aggregate type 'std::atomic_flag' cannot be initialized with an initializer list
  cargo:warning=static std::atomic_flag g_cl_pool_lock = ATOMIC_FLAG_INIT;

darxkies · 2023-06-14T15:42:24Z

@LLukas22 It seems that is an issue that needs to be fixed in llama.cpp. Remove MacOS/CLBlast from the compatibility matrix in the documentation and leave the code as it is?

LLukas22 · 2023-06-15T07:43:00Z

@darxkies I think we can set CLBlast for macos out-of-scope for now, realistically macos always want to use metal if possible. If its fixed in llama.cpp later on we could enable support via another PR.

philpax

Fantastic work! I can't test this at present, but it looks quite comprehensive and I really appreciate the documentation.

Once that comment's added and @LLukas22's happy with everything, I have no problems with him merging it in.

Thanks for the great work! 🚀

philpax · 2023-06-15T17:25:23Z

.cargo/config

@@ -0,0 +1,2 @@
+[target.x86_64-pc-windows-msvc]
+rustflags = ["-Ctarget-feature=+crt-static"]


Can you add a comment about why this is necessary?

Without that option, the compiler throws warnings when CLBlast is enabled.

Hm, would it be possible to move that to build.rs, have it turn on if and only if it's msvc + clblast, and add a comment? Changing how the CRT is linked can cause global issues, so I'd prefer to reduce the scope of this. (I think you can turn target features on from build.rs)

darxkies · 2023-06-15T17:59:45Z

@LLukas22 In that case, the documentation needs to be updated to remove MacOS/CLBlast as supported.

LLukas22 · 2023-06-15T18:50:17Z

@darxkies I agree with philpax, it would be nice if we could move the flag into build.rs somehow. Then we just update the documentation with a hint that clblast is currently broken on mac-os. Give this a final test and merge it.

darxkies · 2023-06-15T18:58:55Z

I wouldn't add the hint that is broken. Just remove it altogether. At least in the documentation.

Regarding the flag, that was the "recommended" way of solving it. I will take a look at it this weekend. Perhaps overriding RUSTFLAGS in build.rs.

Before merging it, it would make sense to update llama.cpp too.

LLukas22 · 2023-06-16T07:17:04Z

@darxkies Ok the linking stuff doesn't seam to be trivial, the rust book has a short section about it: https://rust-lang.github.io/rfcs/1721-crt-static.html#customizing-linkage-to-the-c-runtime

Regarding the documentation i will just remove the clblast section from mac-os and mark it as unsupported.

philpax · 2023-06-16T09:59:49Z

Given that, I'd suggest removing the CRT override entirely and moving it to documentation. I suspect that's a decision that's best made by the integrator / application author.

LLukas22 · 2023-06-16T12:57:49Z

@darxkies I tested my configuration a bit and i cant reproduce the warnings from the compiler when i removed the .cargo file.
I removed it from the PR and added a disclaimer to the documentation, that it may be necessary for certain setups.
What are your thoughts on this?

darxkies · 2023-06-16T16:59:49Z

Very interesting. If you make a cargo clean and then compile with clblast on, the warning should be there. I will take a look at it tomorrow.

…rence from the documentation.

darxkies · 2023-06-18T08:28:49Z

I did a git pull, cargo clean, cargo build --release --features clblast and this is the warning I got, that I referred to previously:

warning: cl : Command line warning D9025 : overriding '/MD' with '/MT'

LLukas22 mentioned this pull request Jun 5, 2023

GPU support - Feature Request LLukas22/llm-rs-python#19

Closed

Merge branch 'main' into ggml-blas

8006e1d

Added cuBLAS/Windows support

4d12f70

LLukas22 added 2 commits June 9, 2023 14:13

Added support for cuda and opencl to generate-ggml-bindings

ce0328c

Formatting

acc9542

darxkies added 3 commits June 10, 2023 08:56

Merge branch 'main' into ggml-blas

3365a28

Updated code for llama-cpp. cuBLAS and CLBlast work with Arch. Window…

a0a4669

…s/cuBLAS needs testing. If both features are specified cuBLAS will be selected at compile time.

Added untested support for Metal. The Metal bindings might need to be

022a075

reworked. Use metal in features list to activate it.

darxkies mentioned this pull request Jun 10, 2023

Support for GGML BLAS #190

Closed

LLukas22 added 3 commits June 12, 2023 13:32

clippy + fmt

942143c

Merge branch 'main' into pr/282

c6b363d

Added documentation for windows

5946580

Updated llama.cpp

0d88100

pixelspark mentioned this pull request Jun 13, 2023

Implement Metal support #311

Merged

6 tasks

Added pixelspark's Metal fix

65ce0d3

philpax approved these changes Jun 15, 2023

View reviewed changes

philpax added the topic:backend-support Support for alternate non-GGML backends, or for particular GGML backend features label Jun 15, 2023

docu remove clblast from macos

ef4e644

Removed rustflag and added disclaimer to docu

39ba320

Chore: Updated LLama.cpp

b401251

Updated llama.cpp, generated bindings, and removed MacOS/CLBlast refe…

abcf77b

…rence from the documentation.

LLukas22 added 2 commits June 18, 2023 14:47

Update CONTRIBUTING.md

b41b26f

Merge branch 'ggml-blas' of https://github.com/darxkies/llm into pr/282

ddadb65

LLukas22 merged commit 3becd72 into rustformers:main Jun 18, 2023

hhamud mentioned this pull request Aug 7, 2023

Write a 0.2 changelog #244

Open

		@@ -0,0 +1,2 @@
		[target.x86_64-pc-windows-msvc]
		rustflags = ["-Ctarget-feature=+crt-static"]

Added support for cuBLAS and CLBlast in ggml. #282

Added support for cuBLAS and CLBlast in ggml. #282

Conversation

darxkies commented May 28, 2023

philpax commented May 28, 2023

LLukas22 commented Jun 1, 2023

darxkies commented Jun 1, 2023

zeon256 commented Jun 7, 2023

darxkies commented Jun 7, 2023

LLukas22 commented Jun 7, 2023

darxkies commented Jun 7, 2023

LLukas22 commented Jun 7, 2023 • edited Loading

darxkies commented Jun 7, 2023

LLukas22 commented Jun 7, 2023

LLukas22 commented Jun 7, 2023

darxkies commented Jun 8, 2023

LLukas22 commented Jun 8, 2023

darxkies commented Jun 8, 2023

LLukas22 commented Jun 9, 2023

darxkies commented Jun 9, 2023

LLukas22 commented Jun 9, 2023

darxkies commented Jun 10, 2023

darxkies commented Jun 10, 2023

LLukas22 commented Jun 12, 2023

SlyEcho commented Jun 12, 2023

pixelspark commented Jun 13, 2023

pixelspark commented Jun 13, 2023

darxkies commented Jun 14, 2023

LLukas22 commented Jun 15, 2023

philpax left a comment

Choose a reason for hiding this comment

philpax Jun 15, 2023

Choose a reason for hiding this comment

darxkies Jun 15, 2023

Choose a reason for hiding this comment

philpax Jun 15, 2023

Choose a reason for hiding this comment

darxkies commented Jun 15, 2023

LLukas22 commented Jun 15, 2023

darxkies commented Jun 15, 2023

LLukas22 commented Jun 16, 2023

philpax commented Jun 16, 2023

LLukas22 commented Jun 16, 2023

darxkies commented Jun 16, 2023

darxkies commented Jun 18, 2023

LLukas22 commented Jun 7, 2023 •

edited

Loading