Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stochastic Sampling for Speculative Decoding example #5384

Closed
mscheong01 opened this issue Feb 7, 2024 · 2 comments · Fixed by #5625
Closed

Stochastic Sampling for Speculative Decoding example #5384

mscheong01 opened this issue Feb 7, 2024 · 2 comments · Fixed by #5625
Labels
enhancement New feature or request

Comments

@mscheong01
Copy link
Collaborator

Feature Description

It seems that the speculative decoding example in this repo only utilizes greedy sampling. Are there any plans for supporting stochastic sampling as well? If not so, could I maybe give it a try based on the paper and implementations inside https://github.com/lucidrains/speculative-decoding?

@mscheong01 mscheong01 added the enhancement New feature or request label Feb 7, 2024
@ggerganov
Copy link
Member

Yes, I was thinking about adding stochastic sampling to the speculative example, but haven't gotten to that yet. If you want to give it a try - please go ahead

@BarfingLemurs
Copy link
Contributor

Any numbers? I had thought normal sampling parameters work (70B Q4 + 1.1B Q4 = ~1.3x increase), but don't remember some details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants