Static lookup decoding #5398

JohannesGaessler · 2024-02-07T20:46:05Z

I had an idea for an alternative approach to lookup decoding. Instead of looking at the context I'm extracting token sequences from a static corpus of text and then using the most common sequences in that text corpus to construct a draft. It somewhat works, but honestly the results are not very good. With predictions based on the previous 2 tokens, wikitext train as the static text corpus, and a prompt that generates a story, the acceptance rate of the draft is only ~10.

With the commands

export model_name=mixtral_instruct-8x7b && export quantization=q8_0
./lookup-static --model models/opt/${model_name}-${quantization}.gguf -ngl 99 --ctx-size 4096 --n-predict 1024 --seed 1337 --draft 1 --color --prompt "[INST] Write a love story about two stars that tragically ends in a type Ia supernova. Use a lot of eotional and dramatic language. [/INST]"

I get this output:

It would be great if it turns out I just did the implementation wrong but my intuition is that language is just not as easily predictable as I had hoped. If there is a desire to turn this into a proper PR I could do it but I personally believe this has more value as a negative result, i.e. to prevent other devs from wasting their time on this approach.

JohannesGaessler · 2024-02-07T21:00:15Z

I forgot to say: with a prompt like "[INST] Explain to me how a type Ia supernova occurs. [/INST]" that would result in an output more similar to the text corpus the acceptance rate is also only ~10%:

And for those cases the lookup decoding that is already on master works a lot better (~50% acceptance rate) because there is a lot more repetition.

ggerganov · 2024-02-08T08:27:25Z

Having a custom ngram cache that's dynamically adjusted based on the stuff that the user generates locally should improve significantly the acceptance rate. I wrote a bit about this here: #4235

JohannesGaessler · 2024-02-08T09:53:57Z

Okay, it seems my implementation had a bug where one of the hashmaps wasn't being updated correctly. With the fix and an additional filter that only accepts those sequences which have a relative frequency of >= 50% I get much better results:

With the story prompt I get ~28% acceptance rate, ~24% with the factual prompt. This is potentially something that could be workable. The correctly predicted tokens only make up ~5% of the generated test though so the maximum theoretical speedup is still low. But maybe you could combine this technique with the lookup decoding implementation on master to get more token predictions.

JohannesGaessler · 2024-02-10T13:36:51Z

It seems that the size of the text corpus makes a large difference. I used wikitext-103 instead of wikitext-2 (~50x larger) and the results are much better:

The acceptance rate has increased to ~50% and ~10% of the final result consists of correctly predicted tokens.

JohannesGaessler · 2024-03-23T01:08:50Z

Obsoleted by #5479 .

JohannesGaessler added 9 commits February 7, 2024 18:48

initial commit

617ae42

random choice

51b7317

read static_input_file

6d47013

count token combinations

1d6059a

partial implementation

4ce0211

partial implementation

1574279

works (?), 2.28% accept

6cecefd

fix, accept 2.8%

e390b22

n_considered configurable

449585a

JohannesGaessler added the research 🔬 label Feb 7, 2024

JohannesGaessler added 2 commits February 8, 2024 10:11

fix hashmap code

6d2693b

frequency_threshold

a47e4cb

This was referenced Feb 12, 2024

lookup: use hashmaps, select most frequent tokens, abort draft early if no good candidates #5462

Closed

lookup: complement data from context with general text statistics #5479

Merged

JohannesGaessler closed this Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Static lookup decoding #5398

Static lookup decoding #5398

JohannesGaessler commented Feb 7, 2024

JohannesGaessler commented Feb 7, 2024

ggerganov commented Feb 8, 2024

JohannesGaessler commented Feb 8, 2024

JohannesGaessler commented Feb 10, 2024

JohannesGaessler commented Mar 23, 2024

Static lookup decoding #5398

Static lookup decoding #5398

Conversation

JohannesGaessler commented Feb 7, 2024

JohannesGaessler commented Feb 7, 2024

ggerganov commented Feb 8, 2024

JohannesGaessler commented Feb 8, 2024

JohannesGaessler commented Feb 10, 2024

JohannesGaessler commented Mar 23, 2024