-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Static lookup decoding #5398
Static lookup decoding #5398
Conversation
I forgot to say: with a prompt like "[INST] Explain to me how a type Ia supernova occurs. [/INST]" that would result in an output more similar to the text corpus the acceptance rate is also only ~10%: And for those cases the lookup decoding that is already on master works a lot better (~50% acceptance rate) because there is a lot more repetition. |
Having a custom ngram cache that's dynamically adjusted based on the stuff that the user generates locally should improve significantly the acceptance rate. I wrote a bit about this here: #4235 |
Okay, it seems my implementation had a bug where one of the hashmaps wasn't being updated correctly. With the fix and an additional filter that only accepts those sequences which have a relative frequency of >= 50% I get much better results: With the story prompt I get ~28% acceptance rate, ~24% with the factual prompt. This is potentially something that could be workable. The correctly predicted tokens only make up ~5% of the generated test though so the maximum theoretical speedup is still low. But maybe you could combine this technique with the lookup decoding implementation on master to get more token predictions. |
Obsoleted by #5479 . |
I had an idea for an alternative approach to lookup decoding. Instead of looking at the context I'm extracting token sequences from a static corpus of text and then using the most common sequences in that text corpus to construct a draft. It somewhat works, but honestly the results are not very good. With predictions based on the previous 2 tokens, wikitext train as the static text corpus, and a prompt that generates a story, the acceptance rate of the draft is only ~10.
With the commands
I get this output:
It would be great if it turns out I just did the implementation wrong but my intuition is that language is just not as easily predictable as I had hoped. If there is a desire to turn this into a proper PR I could do it but I personally believe this has more value as a negative result, i.e. to prevent other devs from wasting their time on this approach.