forked from ggerganov/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
llama : add support for GLM-Edge and GLM-Edge-V series models (ggerga…
…nov#10573) * add glm edge chat model * use config partial_rotary_factor as rope ratio * support for glm edge model * vision model support * remove debug info * fix format * llava.cpp trailing whitespace * remove unused AutoTokenizer * Update src/llama.cpp for not contain <|end|> or </s> Co-authored-by: Xuan Son Nguyen <[email protected]> * add edge template * fix chat template * fix confict * fix confict * fix ci err * fix format err * fix template err * 9b hf chat support * format * format clip.cpp * fix format * Apply suggestions from code review * Apply suggestions from code review * Update examples/llava/clip.cpp * fix format * minor : style --------- Co-authored-by: liyuhang <[email protected]> Co-authored-by: piDack <[email protected]> Co-authored-by: Xuan Son Nguyen <[email protected]> Co-authored-by: liyuhang <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
- Loading branch information
1 parent
53debe6
commit 0cec062
Showing
15 changed files
with
568 additions
and
67 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# GLMV-EDGE | ||
|
||
Currently this implementation supports [glm-edge-v-2b](https://huggingface.co/THUDM/glm-edge-v-2b) and [glm-edge-v-5b](https://huggingface.co/THUDM/glm-edge-v-5b). | ||
|
||
## Usage | ||
Build with cmake or run `make llama-llava-cli` to build it. | ||
|
||
After building, run: `./llama-llava-cli` to see the usage. For example: | ||
|
||
```sh | ||
./llama-llava-cli -m model_path/ggml-model-f16.gguf --mmproj model_path/mmproj-model-f16.gguf --image img_path/image.jpg -p "<|system|>\n system prompt <image><|user|>\n prompt <|assistant|>\n" | ||
``` | ||
|
||
**note**: A lower temperature like 0.1 is recommended for better quality. add `--temp 0.1` to the command to do so. | ||
**note**: For GPU offloading ensure to use the `-ngl` flag just like usual | ||
|
||
## GGUF conversion | ||
|
||
1. Clone a GLMV-EDGE model ([2B](https://huggingface.co/THUDM/glm-edge-v-2b) or [5B](https://huggingface.co/THUDM/glm-edge-v-5b)). For example: | ||
|
||
```sh | ||
git clone https://huggingface.co/THUDM/glm-edge-v-5b or https://huggingface.co/THUDM/glm-edge-v-2b | ||
``` | ||
|
||
2. Use `glmedge-surgery.py` to split the GLMV-EDGE model to LLM and multimodel projector constituents: | ||
|
||
```sh | ||
python ./examples/llava/glmedge-surgery.py -m ../model_path | ||
``` | ||
|
||
4. Use `glmedge-convert-image-encoder-to-gguf.py` to convert the GLMV-EDGE image encoder to GGUF: | ||
|
||
```sh | ||
python ./examples/llava/glmedge-convert-image-encoder-to-gguf.py -m ../model_path --llava-projector ../model_path/glm.projector --output-dir ../model_path | ||
``` | ||
|
||
5. Use `examples/convert_hf_to_gguf.py` to convert the LLM part of GLMV-EDGE to GGUF: | ||
|
||
```sh | ||
python convert_hf_to_gguf.py ../model_path | ||
``` | ||
|
||
Now both the LLM part and the image encoder are in the `model_path` directory. |
Oops, something went wrong.