Releases · ngxson/llama.cpp

07 Feb 15:18

d2fe216

b4667 Latest

Latest

Make logging more verbose (#11714)

Debugged an issue with a user who was on a read-only filesystem.

Signed-off-by: Eric Curtin <[email protected]>

Assets 23

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-02-07T15:18:14Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-02-07T15:18:22Z
llama-b4667-bin-macos-arm64.zip

25.8 MB 2025-02-07T15:18:30Z
llama-b4667-bin-macos-x64.zip

27.6 MB 2025-02-07T15:18:32Z
llama-b4667-bin-ubuntu-x64.zip

29.6 MB 2025-02-07T15:18:33Z
llama-b4667-bin-win-avx-x64.zip

15.7 MB 2025-02-07T15:18:34Z
llama-b4667-bin-win-avx2-x64.zip

15.7 MB 2025-02-07T15:18:35Z
llama-b4667-bin-win-avx512-x64.zip

15.7 MB 2025-02-07T15:18:36Z
llama-b4667-bin-win-cuda-cu11.7-x64.zip

150 MB 2025-02-07T15:18:37Z
llama-b4667-bin-win-cuda-cu12.4-x64.zip

150 MB 2025-02-07T15:18:42Z
Source code (zip)

2025-02-07T14:42:46Z
Source code (tar.gz)

2025-02-07T14:42:46Z

07 Feb 14:55

github-actions

b4666

ed926d8

b4666

llama : fix defrag logic (#11707)

* llama : fix defrag logic

ggml-ci

* cont : better logic

ggml-ci

* cont : clamp fragmentation to 0.0

ggml-ci

Assets 23

07 Feb 14:30

github-actions

b4664

333820d

b4664

llama : fix progress dots (#11730)

* Update llama.cpp

For display progress dots in terminal.
Without this it didn't display dots progress during loading model from file.

* Update llama.cpp

removed trailing spaces

Assets 23

07 Feb 11:00

github-actions

b4663

c026ba3

b4663

vulkan: print shared memory size (#11719)

Assets 23

07 Feb 10:15

github-actions

b4662

7ee953a

b4662

llama : add llama_sampler_init for safe usage of llama_sampler_free (…

Assets 23

07 Feb 10:06

github-actions

b4661

ec3bc82

b4661

SYCL: remove XMX info from print devices (#11712)

Assets 23

07 Feb 08:48

github-actions

b4660

b7552cf

b4660

common : add default embeddings presets (#11677)

* common : add default embeddings presets

This commit adds default embeddings presets for the following models:
- bge-small-en-v1.5
- e5-small-v2
- gte-small

These can be used with llama-embedding and llama-server.

For example, with llama-embedding:
```console
./build/bin/llama-embedding --embd-gte-small-default -p "Hello, how are you?"
```

And with llama-server:
```console
./build/bin/llama-server --embd-gte-small-default
```
And the embeddings endpoint can then be called with a POST request:
```console
curl --request POST \
    --url http://localhost:8080/embeddings \
    --header "Content-Type: application/json" \
    --data '{"input": "Hello, how are you?"}'
```

I'm not sure if these are the most common embedding models but hopefully
this can be a good starting point for discussion and further
improvements.

Refs: https://github.com/ggerganov/llama.cpp/issues/10932

Assets 23

07 Feb 08:11

github-actions

b4659

225bbbf

b4659

ggml : optimize and build warning fix for LoongArch (#11709)

* ggml : optimize convert f32<->f16 for loongarch_asx

* ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16

* ggml : Fix warnings when run cpu CI locally on LoongArch

Assets 23

06 Feb 22:18

github-actions

b4658

855cd07

b4658

llama : fix old glm4 models (#11670)

Assets 23

06 Feb 11:57

github-actions

b4651

c0d4843

b4651

build : fix llama.pc (#11658)

Signed-off-by: Adrien Gallouët <[email protected]>

Assets 23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ngxson/llama.cpp

b4667

b4666

b4664

b4663

b4662

b4661

b4660

b4659

b4658

b4651