Releases · ngxson/llama.cpp

02 Mar 10:12

9bf297a

b2311

workflows : remove nocleanup arg for check-requirements.sh (#5826)

Reduces peak tmpfs usage and should prevent the check from failing from
running out of space.

Fixes the 'No space left on device' issue mentioned in #5703.

Assets 14

29 Feb 14:10

github-actions

b2296

d5ab297

b2296

llama : constified `llama_set_state_data`'s `src` (#5774)

Assets 14

28 Feb 21:10

github-actions

b2295

87c91c0

b2295

ci : reduce 3b ppl chunks to 1 to avoid timeout (#5771)

ggml-ci

Assets 14

27 Feb 20:58

github-actions

b2282

cb49e0f

b2282

Attempt to fix android build (#5752)

Co-authored-by: Iwan Kawrakow <[email protected]>

Assets 14

26 Feb 14:19

github-actions

b2271

67fd331

b2271

unicode : reuse iterator (#5726)

Assets 14

25 Feb 21:02

github-actions

b2264

bf08e00

b2264

llama : refactor k-shift implementation + KV defragmentation (#5691)

* llama : refactor k-shift implementation

ggml-ci

* llama : rename llama_kv_cache_seq_shift to llama_kv_cache_seq_add

* llama : cont k-shift refactoring + normalize type names

ggml-ci

* minor : fix MPI builds

* llama : reuse n_rot from the build context

ggml-ci

* llama : revert enum name changes from this PR

ggml-ci

* llama : update llama_rope_type

* llama : add comment about rope values

* llama : fix build

* passkey : apply kv cache updates explicitly

ggml-ci

* llama : change name to llama_kv_cache_update()

* llama : add llama_kv_cache_seq_pos_max()

* passkey : fix llama_kv_cache_seq_pos_max() usage

* llama : some llama_kv_cell simplifications

* llama : add llama_kv_cache_compress (EXPERIMENTAL)

* llama : add alternative KV cache merging (EXPERIMENTAL)

* llama : add llama_kv_cache_defrag

* llama : comments

* llama : remove llama_kv_cache_compress

will add in a separate PR

ggml-ci

* llama : defragment via non-overlapping moves

* llama : ggml_graph based defrag implementation

ggml-ci

* llama : switch the loop order in build_defrag

* llama : add comments

Assets 14

25 Feb 14:21

github-actions

b2259

930b178

b2259

server: logs - unified format and --log-format option (#5700)

* server: logs - always use JSON logger, add add thread_id in message, log task_id and slot_id

* server : skip GH copilot requests from logging

* server : change message format of server_log()

* server : no need to repeat log in comment

* server : log style consistency

* server : fix compile warning

* server : fix tests regex patterns on M2 Ultra

* server: logs: PR feedback on log level

* server: logs: allow to choose log format in json or plain text

* server: tests: output server logs in text

* server: logs switch init logs to server logs macro

* server: logs ensure value json value does not raised error

* server: logs reduce level VERBOSE to VERB to max 4 chars

* server: logs lower case as other log messages

* server: logs avoid static in general

Co-authored-by: Georgi Gerganov <[email protected]>

* server: logs PR feedback: change text log format to: LEVEL [function_name] message | additional=data

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 14

25 Feb 10:19

github-actions

b2254

9e359a4

b2254

server: continue to update other slots on embedding concurrent reques…

Assets 14

23 Feb 20:42

github-actions

b2251

fd43d66

b2251

server : add KV cache quantization options (#5684)

Assets 14

22 Feb 15:43

github-actions

b2239

3a03541

b2239

minor : fix trailing whitespace (#5638)

Assets 14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ngxson/llama.cpp

b2311

b2296

b2295

b2282

b2271

b2264

b2259

b2254

b2251

b2239