Skip to content

Releases: ngxson/llama.cpp

b4667

07 Feb 15:18
d2fe216
Compare
Choose a tag to compare
Make logging more verbose (#11714)

Debugged an issue with a user who was on a read-only filesystem.

Signed-off-by: Eric Curtin <[email protected]>

b4666

07 Feb 14:55
ed926d8
Compare
Choose a tag to compare
llama : fix defrag logic (#11707)

* llama : fix defrag logic

ggml-ci

* cont : better logic

ggml-ci

* cont : clamp fragmentation to 0.0

ggml-ci

b4664

07 Feb 14:30
333820d
Compare
Choose a tag to compare
llama : fix progress dots (#11730)

* Update llama.cpp

For display progress dots in terminal.
Without this it didn't display dots progress during loading model from file.

* Update llama.cpp

removed trailing spaces

b4663

07 Feb 11:00
c026ba3
Compare
Choose a tag to compare
vulkan: print shared memory size (#11719)

b4662

07 Feb 10:15
7ee953a
Compare
Choose a tag to compare
llama : add llama_sampler_init for safe usage of llama_sampler_free (…

b4661

07 Feb 10:06
ec3bc82
Compare
Choose a tag to compare
SYCL: remove XMX info from print devices (#11712)

b4660

07 Feb 08:48
b7552cf
Compare
Choose a tag to compare
common : add default embeddings presets (#11677)

* common : add default embeddings presets

This commit adds default embeddings presets for the following models:
- bge-small-en-v1.5
- e5-small-v2
- gte-small

These can be used with llama-embedding and llama-server.

For example, with llama-embedding:
```console
./build/bin/llama-embedding --embd-gte-small-default -p "Hello, how are you?"
```

And with llama-server:
```console
./build/bin/llama-server --embd-gte-small-default
```
And the embeddings endpoint can then be called with a POST request:
```console
curl --request POST \
    --url http://localhost:8080/embeddings \
    --header "Content-Type: application/json" \
    --data '{"input": "Hello, how are you?"}'
```

I'm not sure if these are the most common embedding models but hopefully
this can be a good starting point for discussion and further
improvements.

Refs: https://github.com/ggerganov/llama.cpp/issues/10932

b4659

07 Feb 08:11
225bbbf
Compare
Choose a tag to compare
ggml : optimize and build warning fix for LoongArch (#11709)

* ggml : optimize convert f32<->f16 for loongarch_asx

* ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16

* ggml : Fix warnings when run cpu CI locally on LoongArch

b4658

06 Feb 22:18
855cd07
Compare
Choose a tag to compare
llama : fix old glm4 models (#11670)

b4651

06 Feb 11:57
c0d4843
Compare
Choose a tag to compare
build : fix llama.pc (#11658)

Signed-off-by: Adrien Gallouët <[email protected]>