Skip to content

Commit

Permalink
llama : add early return for empty range (ggerganov#8327)
Browse files Browse the repository at this point in the history
* llama : add early return for empty range

This commit adds an early return to the llama_kv_cache_seq_add and
llama_kv_cache_seq_div functions.

The motivation for adding this is to avoid looping over the cache
when the range is empty. I ran into this when using the self-extend
feature in main.cpp.

Signed-off-by: Daniel Bevenius <[email protected]>

* llama : add static_cast to fix CI warning/error

This commit attempts to fix the following warning/error:

```console
src/llama.cpp:7271:31: error:
comparison of integer expressions of different signedness:
‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Werror=sign-compare]
 7271 |                         if (i < hparams.n_layer_dense_lead) {
      |                             ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
This can be reproduced locally by setting -Wsign-compare in the
Makefile.

Signed-off-by: Daniel Bevenius <[email protected]>

* squash! llama : add early return for empty range

Remove the setting of cache.head to 0 when the range is empty.

Signed-off-by: Daniel Bevenius <[email protected]>

* Update src/llama.cpp

---------

Signed-off-by: Daniel Bevenius <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
  • Loading branch information
2 people authored and Nexesenex committed Jul 7, 2024
1 parent b14e548 commit 9dd9560
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions llama.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3299,6 +3299,8 @@ static void llama_kv_cache_seq_add(

if (p0 < 0) p0 = 0;
if (p1 < 0) p1 = std::numeric_limits<llama_pos>::max();
// If there is no range then return early to avoid looping over the cache.
if (p0 == p1) return;

if (cache.recurrent) {
// for Mamba-like models, only the pos needs to be shifted
Expand Down Expand Up @@ -3343,6 +3345,8 @@ static void llama_kv_cache_seq_div(
int d) {
if (p0 < 0) p0 = 0;
if (p1 < 0) p1 = std::numeric_limits<llama_pos>::max();
// If there is no range then return early to avoid looping over the cache.
if (p0 == p1) return;

if (cache.recurrent) {
// for Mamba-like models, only the pos needs to be changed
Expand Down

0 comments on commit 9dd9560

Please sign in to comment.