-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] JSON constrained support #1125
Conversation
Any idea why the accuracy test is failing? I'm not sure I've done any changes that could impact accuracy, at least when not using a json_schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! I left a few comments.
@havetc can you address the comments and rebase? |
Hello! I'm back from a few days on holidays, I'm on it |
* based on regex constrained generation, with another type of cache for json added (to store FSM and regex conversion) * minor changes to batch scheduler, to avoid edge cases were it could break
e142c61
to
00fd8b4
Compare
please fix the lint error |
@havetc Thanks for the contribution. It is merged. |
Hi @havetc , actually a better method is that we do not need to change anything inside the server. Then all changes in Does it make sense? If so, can you simplify your code with another PR? |
@merrymercy But as I understand it, that wouldn't cache the Json compilation to regex? As it is a costly operation that can take several second on cpu, it really makes sense to cache it to avoid doing it for each request. |
Hi all - thanks for this. I noticed that this doesn't actually expose I opened a PR here #1254 (there may be other areas that need updating, I just wanted to hack together something quickly so we could experiment). |
@qeternity |
I have been testing here for the past few hours, it's all working well. But @merrymercy will have a better view on whether there are other places that also need to be updated to accept Will have a look at docs and tests later if I have time. |
@havetc I see. In this case, we can also make a cache on the TokenizerManager to cache the JSON -> regex conversion. In this way, the conversion even overlaps with GPU computation. I just want to make the core part in tp_worker as minimal as possible, so we want to remove this duplication on handling both regex and json schema. |
Co-authored-by: Yineng Zhang <[email protected]>
Motivation
A lot of llm API (Together AI, fireworks, Anyscale...) and other engines (vllm...) support constrained generation with a JSON schema. As outlines is already a dependency of sglang, it is straightforward to extend its usage to directly support json schema in the API.
Modification
Adding
json_schema
parameter (in sampling params for sglang, as an extra parameter for CompetionRequest/ChatCompletionRequest for openai compatible server)Adding a new
FSMJsonCache
for JSON. It inheritFSMCache
, so it functions the same way, but in addition it also stores the regex string converted by outlines. This regex string is required by the Jump Forward Cache.Adding a unit test, and updating sampling params documentation
Checklist
pre-commit run --all-files
or other linting tools are used to fix potential lint issues.