[FEAT] JSON constrained support #1125

havetc · 2024-08-16T13:38:06Z

Motivation

A lot of llm API (Together AI, fireworks, Anyscale...) and other engines (vllm...) support constrained generation with a JSON schema. As outlines is already a dependency of sglang, it is straightforward to extend its usage to directly support json schema in the API.

Modification

Adding json_schema parameter (in sampling params for sglang, as an extra parameter for CompetionRequest/ChatCompletionRequest for openai compatible server)

Adding a new FSMJsonCache for JSON. It inherit FSMCache, so it functions the same way, but in addition it also stores the regex string converted by outlines. This regex string is required by the Jump Forward Cache.

Adding a unit test, and updating sampling params documentation

Checklist

Before submitting a PR for review, make sure it has passed verification in your local development environment: limited env with only 24GB VRAM, some out of memory on the test suite but no functional errors.
Ensure pre-commit pre-commit run --all-files or other linting tools are used to fix potential lint issues.
Confirm that modifications are covered by complete unit tests. If not, please add more unit tests for correctness.
Modify documentation as needed, such as docstrings or example tutorials.

havetc · 2024-08-16T13:52:44Z

Any idea why the accuracy test is failing? I'm not sure I've done any changes that could impact accuracy, at least when not using a json_schema.

merrymercy

Thanks for the contribution! I left a few comments.

docs/en/sampling_params.md

python/sglang/srt/constrained/fsm_cache.py

merrymercy · 2024-08-23T22:09:11Z

@havetc can you address the comments and rebase?

havetc · 2024-08-26T09:53:33Z

@merrymercy

Hello! I'm back from a few days on holidays, I'm on it

* based on regex constrained generation, with another type of cache for json added (to store FSM and regex conversion) * minor changes to batch scheduler, to avoid edge cases were it could break

merrymercy · 2024-08-26T16:11:35Z

please fix the lint error

merrymercy · 2024-08-26T16:37:38Z

@havetc Thanks for the contribution. It is merged.

merrymercy · 2024-08-27T20:44:27Z

Hi @havetc , actually a better method is that we do not need to change anything inside the server.
We can just do the JSON schema -> regex conversion in this function https://github.com/havetc/sglang/blob/5ff25cdf5b1310e83d9e595142b39ae4d7b561e9/python/sglang/srt/sampling/sampling_params.py#L114

Then all changes in python/sglang/srt/managers/tp_worker.py, python/sglang/srt/constrained/fsm_cache.py seems not needed anymore.

Does it make sense? If so, can you simplify your code with another PR?

havetc · 2024-08-28T11:06:24Z

@merrymercy But as I understand it, that wouldn't cache the Json compilation to regex? As it is a costly operation that can take several second on cpu, it really makes sense to cache it to avoid doing it for each request.

qeternity · 2024-08-28T20:25:28Z

Hi all - thanks for this. I noticed that this doesn't actually expose json_schema to the gen function, which we'd love to be able to use. I just wanted to raise this (not sure if there was another planned PR to introduce that) and also get it onto a branch that we could use internally.

I opened a PR here #1254 (there may be other areas that need updating, I just wanted to hack together something quickly so we could experiment).

havetc · 2024-08-28T20:36:23Z

@qeternity
Indeed that seems like a miss, I tried to find where to update the code to add a global support for json schema, but as I only ever use sglang as an inference server I didn't think about the gen function.
Your pull request looks good to me (haven't tested though), I think one thing that maybe would be worth updating are the json examples, that uses only regex. (Also, feel free to extend the json unit test with a gen function test if you want to)

qeternity · 2024-08-28T22:46:55Z

I have been testing here for the past few hours, it's all working well. But @merrymercy will have a better view on whether there are other places that also need to be updated to accept json_schema.

Will have a look at docs and tests later if I have time.

merrymercy · 2024-08-29T02:50:54Z

@havetc I see. In this case, we can also make a cache on the TokenizerManager to cache the JSON -> regex conversion. In this way, the conversion even overlaps with GPU computation. I just want to make the core part in tp_worker as minimal as possible, so we want to remove this duplication on handling both regex and json schema.

Co-authored-by: Yineng Zhang <[email protected]>

Ying1123 self-assigned this Aug 17, 2024

Ying1123 mentioned this pull request Aug 17, 2024

Development Roadmap (2024 Q3) #634

Closed

29 tasks

zhyncs changed the title ~~[FEAT] Json constrained support~~ [FEAT] JSON constrained support Aug 17, 2024

zhyncs requested review from Ying1123, merrymercy, zhyncs, yichuan520030910320 and hnyls2002 August 17, 2024 14:53

merrymercy requested changes Aug 20, 2024

View reviewed changes

docs/en/sampling_params.md Outdated Show resolved Hide resolved

python/sglang/srt/constrained/fsm_cache.py Outdated Show resolved Hide resolved

havetc added 7 commits August 26, 2024 16:03

[DEV] Adding constrained JSON generation

ceb182a

* based on regex constrained generation, with another type of cache for json added (to store FSM and regex conversion) * minor changes to batch scheduler, to avoid edge cases were it could break

[DEV] fixing unit test

291617c

[TEST] Adding unit test to default suite

535fa7b

[DOC] Adding json_schema to sampling param doc

099711f

refacto with pre-commit

39361d9

Update given PR comments

1d17d6d

fix after rebase

00fd8b4

havetc force-pushed the json-constrained-support branch from e142c61 to 00fd8b4 Compare August 26, 2024 14:36

havetc and others added 2 commits August 26, 2024 18:13

fixing lint error

05a0662

Merge branch 'main' into json-constrained-support

01ac288

merrymercy approved these changes Aug 26, 2024

View reviewed changes

merrymercy merged commit 9935f97 into sgl-project:main Aug 26, 2024
8 checks passed

havetc mentioned this pull request Aug 27, 2024

[FIX] Wrong logger #1230

Merged

3 tasks

qeternity pushed a commit to qeternity/sglang that referenced this pull request Sep 1, 2024

[FEAT] JSON constrained support (sgl-project#1125)

a5503ef

Co-authored-by: Yineng Zhang <[email protected]>

havetc mentioned this pull request Oct 16, 2024

[Performance] Support xgrammar for faster constrained decoding #1680

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] JSON constrained support #1125

[FEAT] JSON constrained support #1125

havetc commented Aug 16, 2024

havetc commented Aug 16, 2024

merrymercy left a comment

merrymercy commented Aug 23, 2024

havetc commented Aug 26, 2024 •

edited

Loading

merrymercy commented Aug 26, 2024

merrymercy commented Aug 26, 2024

merrymercy commented Aug 27, 2024

havetc commented Aug 28, 2024

qeternity commented Aug 28, 2024

havetc commented Aug 28, 2024

qeternity commented Aug 28, 2024

merrymercy commented Aug 29, 2024 •

edited

Loading

[FEAT] JSON constrained support #1125

[FEAT] JSON constrained support #1125

Conversation

havetc commented Aug 16, 2024

Motivation

Modification

Checklist

havetc commented Aug 16, 2024

merrymercy left a comment

Choose a reason for hiding this comment

merrymercy commented Aug 23, 2024

havetc commented Aug 26, 2024 • edited Loading

merrymercy commented Aug 26, 2024

merrymercy commented Aug 26, 2024

merrymercy commented Aug 27, 2024

havetc commented Aug 28, 2024

qeternity commented Aug 28, 2024

havetc commented Aug 28, 2024

qeternity commented Aug 28, 2024

merrymercy commented Aug 29, 2024 • edited Loading

havetc commented Aug 26, 2024 •

edited

Loading

merrymercy commented Aug 29, 2024 •

edited

Loading