Stochastic Sampling for Speculative Decoding example #5384

mscheong01 · 2024-02-07T08:05:20Z

Feature Description

It seems that the speculative decoding example in this repo only utilizes greedy sampling. Are there any plans for supporting stochastic sampling as well? If not so, could I maybe give it a try based on the paper and implementations inside https://github.com/lucidrains/speculative-decoding?

ggerganov · 2024-02-07T08:08:10Z

Yes, I was thinking about adding stochastic sampling to the speculative example, but haven't gotten to that yet. If you want to give it a try - please go ahead

BarfingLemurs · 2024-02-26T17:58:49Z

Any numbers? I had thought normal sampling parameters work (70B Q4 + 1.1B Q4 = ~1.3x increase), but don't remember some details.

mscheong01 added the enhancement New feature or request label Feb 7, 2024

mscheong01 mentioned this issue Feb 21, 2024

Implement stochastic speculative sampling #5625

Merged

ggerganov closed this as completed in #5625 Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stochastic Sampling for Speculative Decoding example #5384

Stochastic Sampling for Speculative Decoding example #5384

mscheong01 commented Feb 7, 2024

ggerganov commented Feb 7, 2024

BarfingLemurs commented Feb 26, 2024

Stochastic Sampling for Speculative Decoding example #5384

Stochastic Sampling for Speculative Decoding example #5384

Comments

mscheong01 commented Feb 7, 2024

Feature Description

ggerganov commented Feb 7, 2024

BarfingLemurs commented Feb 26, 2024