-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[Feature] Support llguidance for constrained decoding #3298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @Ying1123 @merrymercy @zhyncs |
ok I'll help take a look asap. Thanks for your contribution! |
thanks @zhyncs, really appreciate it |
Hi @zhyncs, any chance we could get a brief review here? We'd like to deploy guidance + sglang for some of our users, and hopefully also deliver benefits to the sglang community! Just some pointers on things you'd like to see changed or improved would help us make sure we're working in the right spirit 🙂 |
@Harsha-Nori Ah Sorry for the delayed response. I've been busy lately. We will review soon. Thank you for your understanding! BTW @JC1DA, can you help resolve the conflicts? Thanks! |
@JC1DA Could you fix the conflicts first? |
hi @zhyncs @zhaochenyang20, fixed :) thanks for taking a look |
@JC1DA Hey. Why should we keep a fixed x-grammar verison? We will use x-grammar as default backend in the next PR. |
@zhaochenyang20 this PR doesn't seem to change xgrammar version used, was the comment for a different PR? |
thanks @zhaochenyang20, just fixed the recent conflict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important dependies like transformers should not be fixed in this PR. Should we use a fixed transformers verison?
I merged it from the main branch, we didn't add transformers in. We only added llguidance dependency. @zhaochenyang20 Can you help rerun the workflows. Just reran pre-commit on pyproject.toml |
@shuaills there is no cache; the grammars are compiled every time; it takes ~2ms on avarage on a large JSON schema test suite with p99.9 of under 40ms (single-threaded), see https://github.com/guidance-ai/jsonschemabench/tree/main/maskbench |
00999da
to
3bf08ad
Compare
Why we don't need a cache here? Can you share some insights, thanks. @mmoskal |
@shuaills there is some pre-computation, but it's per-tokenizer and let's call it grammar type (in this case JSON) not per grammar; anyway it's not very heavy there are more details here https://github.com/guidance-ai/llguidance/blob/main/docs/optimizations.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @zhaochenyang20
Sure, I just did the rebase |
@shuaills shuai., if you feel good. Give an approval. I will merge it today. |
Hi @zhaochenyang20 , are we able to merge now or is they anything I should do to help? |
Motivation
This pull request integrates llguidance backend to extend guided decoding capabilities for sglang
llguidance backend supports regex, json and grammar (lark or ebnf)
We have just released a large JSON Schema benchmark and a paper. Of particular interest might be isolated mask-generation benchmarks - comparing LLGuidance, Outlines, XGrammar and llama.cpp grammars.
Modifications
Checklist