-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
examples : generate JSON according to schema #1887
Conversation
Can we do this logic in C++ so that we can support this in main? |
Well, this is usable with |
Agree C++ may be complex. but JSON is used in server example already. check examples/server/json.hpp |
Yeah, I thought based on the discussion that the JSON dependency meant that server had to be CMake-only and excluded from the Makefile. It does look like it's in the Makefile, although hidden behind a flag, so I might be wrong. That said, there's still the complexity point. Do you feel that JSON schema support directly in main (vs a preprocessor to main) is sufficiently valuable to warrant the extra effort? |
My major usage today is using LLM as a backend of functions. In such scenario, I would always prefer LLM returns a valid JSON string to make the result easy to parse. In such scenario, JSON support is very useful. |
It doesn't seem to match the schema on json.org, for example the root can also be an array or string or any value. I'm not sure if it's possible to transform that grammar into ours, the string escaping is probably the hardest part. |
There's two separate grammars here - As for the root type: in Regarding the JSON spec, for this iteration I carefully followed the syntax on json.org for numbers and strings so it should in fact be compliant. The escaping is indeed there now:
|
@ggerganov any interest in giving this a quick look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Thank you again for such cool contributions!
I tried the example from https://json-schema.org/learn/getting-started-step-by-step.html and the converter had issues with the properties But it works otherwise 👍 |
Thanks! @SlyEcho I added support for the
|
…d grammars by including the `messages` field and adjusting the endpoint to `/v1/chat/completions`. # Aider chat conversation: USER: https://localai.io/features/constrained_grammars/: - [](/) LocalAI - - - [*info* Overview](https://localai.io/) - *rocket_launch* Getting started - [Quickstart](https://localai.io/basics/getting_started/) - [Run other Models](https://localai.io/docs/getting-started/run-other-models/) - [Customizing the Model](https://localai.io/docs/getting-started/customize-model/) - [Run models manually](https://localai.io/docs/getting-started/manual/) - [Build LocalAI from source](https://localai.io/basics/build/) - [*newspaper* News](https://localai.io/basics/news/) - *feature_search* Features - [⚡ GPU acceleration](https://localai.io/features/gpu-acceleration/) - [📖 Text generation (GPT)](https://localai.io/features/text-generation/) - [🗣 Text to audio (TTS)](https://localai.io/features/text-to-audio/) - [🎨 Image generation](https://localai.io/features/image-generation/) - [🧠 Embeddings](https://localai.io/features/embeddings/) - [🆕 GPT Vision](https://localai.io/features/gpt-vision/) - [✍️ Constrained grammars](https://localai.io/features/constrained_grammars/) - [🔈 Audio to text](https://localai.io/features/audio-to-text/) - [🔥 OpenAI functions and tools](https://localai.io/features/openai-functions/) - [💾 Stores](https://localai.io/stores/) - [🖼️ Model gallery](https://localai.io/models/) - [*rocket_launch* Integrations](https://localai.io/docs/integrations/) - *science* Advanced - [Advanced usage](https://localai.io/advanced/) - [Fine-tuning LLMs for text generation](https://localai.io/docs/advanced/fine-tuning/) - *science* References - [Model compatibility table](https://localai.io/model-compatibility/) - [Architecture](https://localai.io/docs/reference/architecture/) - [Available Container images](https://localai.io/docs/reference/container-images/) - [All-In-One images](https://localai.io/docs/reference/aio-images/) - [*quiz* FAQ](https://localai.io/faq/) ::::::::::::::::::::::::::::::::::::::::::::::::: {role="main"} [](/) menu search Search [ ]{.kbd} [ ]{.kbd} - [](%20https://github.com/mudler/LocalAI%20) - [](%20https://twitter.com/LocalAI_API%20) - [](%20https://localai.io/index.xml%20) []{.kbd} []{.kbd} to navigate []{.kbd} to select []{.kbd} to close cancel - [*Home*](/docs/) - [Features](/features/) - ✍️ Constrained grammars On this page - - [Setup](#setup) - [💡 Usage example](#-usage-example) Table of Contents - - [Setup](#setup) - [💡 Usage example](#-usage-example) *article* # ✍️ Constrained grammars The chat endpoint accepts an additional `grammar` parameter which takes a [BNF defined grammar](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form). This allows the LLM to constrain the output to a user-defined schema, allowing to generate `JSON`, `YAML`, and everything that can be defined with a BNF grammar. notifications This feature works only with models compatible with the [llama.cpp](https://github.com/ggerganov/llama.cpp) backend (see also [Model compatibility](/model-compatibility/)). For details on how it works, see the upstream PRs: <ggerganov/llama.cpp#1773>, <ggerganov/llama.cpp#1887> ## Setup [*link*](#setup) Follow the setup instructions from the [LocalAI functions](/features/openai-functions/) page. ## 💡 Usage example [*link*](#-usage-example) For example, to constrain the output to either `yes`, `no`: curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "Do you like apples?"}], "grammar": "root ::= (\"yes\" | \"no\")" }' [ Edit this page](https://github.com/mudler/LocalAI/blob/master/docs/content/docs/features/constrained_grammars.md) Last updated 3 months ago. history ------------------------------------------------------------------------ [](/features/gpt-vision/) *navigate_before* 🆕 GPT Vision [](/features/audio-to-text/) 🔈 Audio to text *navigate_next* © 2023-2024 [Ettore Di Giacinto](https://mudler.pm) ::::::::::::::::::::::::::::::::::::::::::::::::: ASSISTANT: Based on the information provided about the LocalAI's constrained grammars feature, it seems that the `handleCompletions` function in the `rag-api.go` file needs to be updated to support the constrained grammars when making a request to the LocalAI service. The request payload should include a `grammar` field when it is provided in the `CompletionRequest`. Here is the diff for the necessary changes: ```diff --- cluster/apps/home/localai/app/config/rag-api.go +++ cluster/apps/home/localai/app/config/rag-api.go @@ ... @@ func handleCompletions(w http.ResponseWriter, r *http.Request) { atomic.AddUint64(&requestMetrics.CompletionRequests, 1) logRequest(r) var req CompletionRequest if err := json.NewDecoder(r.Body).Decode(&req); err != nil { http.Error(w, err.Error(), http.StatusBadRequest) return } defer r.Body.Close() embedding, err := getEmbeddings(req.Prompt) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } - // Generate completion using constrained grammar (if requested) - var completion CompletionResponse - if req.ConstrainedGrammar != "" { - payload := map[string]interface{}{ - "model": "gpt-4", - "prompt": req.Prompt, - "grammar": req.ConstrainedGrammar, - "max_tokens": req.MaxTokens, - "temperature": req.Temperature, - "top_p": req.TopP, - } - jsonPayload, err := json.Marshal(payload) - if err != nil { - http.Error(w, err.Error(), http.StatusInternalServerError) - return - } + // Generate completion using constrained grammar or retrieval-augmented generation (if requested) + var completionResp CompletionResponse + payload := map[string]interface{}{ + "model": "gpt-4", + "prompt": req.Prompt, + "max_tokens": req.MaxTokens, + "temperature": req.Temperature, + "top_p": req.TopP, + } + if req.ConstrainedGrammar != "" { + payload["grammar"] = req.ConstrainedGrammar + } + jsonPayload, err := json.Marshal(payload) + if err != nil { + http.Error(w, err.Error(), http.StatusInternalServerError) + return + } localAI := os.Getenv("LOCAL_AI_ENDPOINT") if localAI == "" { localAI = defaultLocalAI } - resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload)) + resp, err := http.Post(localAI+"/completions", "application/json", bytes.NewBuffer(jsonPayload)) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { body, err := io.ReadAll(resp.Body) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } http.Error(w, fmt.Sprintf("completion request failed with status code: %d: %s", resp.StatusCode, body), resp.StatusCode) return } - var respBody struct { - Result CompletionResponse `json:"result"` - } - if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil { + if err := json.NewDecoder(resp.Body).Decode(&completionResp); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } - completion = respBody.Result + jsonResp, err := json.Marshal(completionResp) + if err != nil { + http.Error(w, err.Error(), http.StatusInternalServerError) + return + } - jsonResp, err := json.Marshal(completion) - if err != nil { - http.Error(w, err.Error(), http.StatusInternalServerError) - return - } w.Header().Set("Content-Type", "application/json") w.WriteHeader(http.StatusOK) w.Write(jsonResp) } ``` This diff updates the `handleCompletions` function to include the `grammar` field in the request payload if it is provided in the `CompletionRequest`. It also corrects the endpoint to `/completions` as per the LocalAI documentation. USER: UnifiedDiffNoMatch: hunk failed to apply! cluster/apps/home/localai/app/config/rag-api.go does not contain lines that match the diff you provided! Try again. DO NOT skip blank lines, comments, docstrings, etc! The diff needs to apply cleanly to the lines in cluster/apps/home/localai/app/config/rag-api.go! cluster/apps/home/localai/app/config/rag-api.go does not contain these 76 exact lines in a row: ``` func handleCompletions(w http.ResponseWriter, r *http.Request) { atomic.AddUint64(&requestMetrics.CompletionRequests, 1) logRequest(r) var req CompletionRequest if err := json.NewDecoder(r.Body).Decode(&req); err != nil { http.Error(w, err.Error(), http.StatusBadRequest) return } defer r.Body.Close() embedding, err := getEmbeddings(req.Prompt) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } // Generate completion using constrained grammar (if requested) var completion CompletionResponse if req.ConstrainedGrammar != "" { payload := map[string]interface{}{ "model": "gpt-4", "prompt": req.Prompt, "grammar": req.ConstrainedGrammar, "max_tokens": req.MaxTokens, "temperature": req.Temperature, "top_p": req.TopP, } jsonPayload, err := json.Marshal(payload) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } localAI := os.Getenv("LOCAL_AI_ENDPOINT") if localAI == "" { localAI = defaultLocalAI } resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload)) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { body, err := io.ReadAll(resp.Body) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } http.Error(w, fmt.Sprintf("completion request failed with status code: %d: %s", resp.StatusCode, body), resp.StatusCode) return } var respBody struct { Result CompletionResponse `json:"result"` } if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } completion = respBody.Result jsonResp, err := json.Marshal(completion) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } w.Header().Set("Content-Type", "application/json") w.WriteHeader(http.StatusOK) w.Write(jsonResp) } ``` ASSISTANT: I apologize for the confusion. Let's try again with the correct diff that applies cleanly to the `cluster/apps/home/localai/app/config/rag-api.go` file. I will update the `handleCompletions` function to correctly handle the constrained grammars feature as described in the LocalAI documentation. ```diff --- cluster/apps/home/localai/app/config/rag-api.go +++ cluster/apps/home/localai/app/config/rag-api.go @@ ... @@ func handleCompletions(w http.ResponseWriter, r *http.Request) { atomic.AddUint64(&requestMetrics.CompletionRequests, 1) logRequest(r) var req CompletionRequest if err := json.NewDecoder(r.Body).Decode(&req); err != nil { http.Error(w, err.Error(), http.StatusBadRequest) return } defer r.Body.Close() // Generate completion using constrained grammar (if requested) var completion CompletionResponse if req.ConstrainedGrammar != "" { payload := map[string]interface{}{ "model": "gpt-4", "prompt": req.Prompt, - "grammar": req.ConstrainedGrammar, + "grammar": req.ConstrainedGrammar, "max_tokens": req.MaxTokens, "temperature": req.Temperature, "top_p": req.TopP, } jsonPayload, err := json.Marshal(payload) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } localAI := os.Getenv("LOCAL_AI_ENDPOINT") if localAI == "" { localAI = defaultLocalAI } - resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload)) + resp, err := http.Post(localAI+"/completions", "application/json", bytes.NewBuffer(jsonPayload)) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { body, err := io.ReadAll(resp.Body) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } http.Error(w, fmt.Sprintf("completion request failed with status code: %d: %s", resp.StatusCode, body), resp.StatusCode) return } var respBody struct { Result CompletionResponse `json:"result"` } if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } completion = respBody.Result } else { // Implement retrieval-augmented generation using retrieved data // ... } jsonResp, err := json.Marshal(completion) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } w.Header().Set ^C KeyboardInterrupt USER: https://localai.io/features/constrained_grammars/: - [](/) LocalAI - - - [*info* Overview](https://localai.io/) - *rocket_launch* Getting started - [Quickstart](https://localai.io/basics/getting_started/) - [Run other Models](https://localai.io/docs/getting-started/run-other-models/) - [Customizing the Model](https://localai.io/docs/getting-started/customize-model/) - [Run models manually](https://localai.io/docs/getting-started/manual/) - [Build LocalAI from source](https://localai.io/basics/build/) - [*newspaper* News](https://localai.io/basics/news/) - *feature_search* Features - [⚡ GPU acceleration](https://localai.io/features/gpu-acceleration/) - [📖 Text generation (GPT)](https://localai.io/features/text-generation/) - [🗣 Text to audio (TTS)](https://localai.io/features/text-to-audio/) - [🎨 Image generation](https://localai.io/features/image-generation/) - [🧠 Embeddings](https://localai.io/features/embeddings/) - [🆕 GPT Vision](https://localai.io/features/gpt-vision/) - [✍️ Constrained grammars](https://localai.io/features/constrained_grammars/) - [🔈 Audio to text](https://localai.io/features/audio-to-text/) - [🔥 OpenAI functions and tools](https://localai.io/features/openai-functions/) - [💾 Stores](https://localai.io/stores/) - [🖼️ Model gallery](https://localai.io/models/) - [*rocket_launch* Integrations](https://localai.io/docs/integrations/) - *science* Advanced - [Advanced usage](https://localai.io/advanced/) - [Fine-tuning LLMs for text generation](https://localai.io/docs/advanced/fine-tuning/) - *science* References - [Model compatibility table](https://localai.io/model-compatibility/) - [Architecture](https://localai.io/docs/reference/architecture/) - [Available Container images](https://localai.io/docs/reference/container-images/) - [All-In-One images](https://localai.io/docs/reference/aio-images/) - [*quiz* FAQ](https://localai.io/faq/) ::::::::::::::::::::::::::::::::::::::::::::::::: {role="main"} [](/) menu search Search [ ]{.kbd} [ ]{.kbd} - [](%20https://github.com/mudler/LocalAI%20) - [](%20https://twitter.com/LocalAI_API%20) - [](%20https://localai.io/index.xml%20) []{.kbd} []{.kbd} to navigate []{.kbd} to select []{.kbd} to close cancel - [*Home*](/docs/) - [Features](/features/) - ✍️ Constrained grammars On this page - - [Setup](#setup) - [💡 Usage example](#-usage-example) Table of Contents - - [Setup](#setup) - [💡 Usage example](#-usage-example) *article* # ✍️ Constrained grammars The chat endpoint accepts an additional `grammar` parameter which takes a [BNF defined grammar](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form). This allows the LLM to constrain the output to a user-defined schema, allowing to generate `JSON`, `YAML`, and everything that can be defined with a BNF grammar. notifications This feature works only with models compatible with the [llama.cpp](https://github.com/ggerganov/llama.cpp) backend (see also [Model compatibility](/model-compatibility/)). For details on how it works, see the upstream PRs: <ggerganov/llama.cpp#1773>, <ggerganov/llama.cpp#1887> ## Setup [*link*](#setup) Follow the setup instructions from the [LocalAI functions](/features/openai-functions/) page. ## 💡 Usage example [*link*](#-usage-example) For example, to constrain the output to either `yes`, `no`: curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "Do you like apples?"}], "grammar": "root ::= (\"yes\" | \"no\")" }' [ Edit this page](https://github.com/mudler/LocalAI/blob/master/docs/content/docs/features/constrained_grammars.md) Last updated 3 months ago. history ------------------------------------------------------------------------ [](/features/gpt-vision/) *navigate_before* 🆕 GPT Vision [](/features/audio-to-text/) 🔈 Audio to text *navigate_next* © 2023-2024 [Ettore Di Giacinto](https://mudler.pm) ::::::::::::::::::::::::::::::::::::::::::::::::: ASSISTANT: To implement the constrained grammars feature as described in the LocalAI documentation, we need to update the `handleCompletions` function in the `rag-api.go` file to correctly handle the `grammar` parameter in the request payload. Here is the diff for the necessary changes: ```diff --- cluster/apps/home/localai/app/config/rag-api.go +++ cluster/apps/home/localai/app/config/rag-api.go @@ ... @@ func handleCompletions(w http.ResponseWriter, r *http.Request) { atomic.AddUint64(&requestMetrics.CompletionRequests, 1) logRequest(r) var req CompletionRequest if err := json.NewDecoder(r.Body).Decode(&req); err != nil { http.Error(w, err.Error(), http.StatusBadRequest) return } defer r.Body.Close() // Generate completion using constrained grammar (if requested) var completion CompletionResponse if req.ConstrainedGrammar != "" { payload := map[string]interface{}{ "model": "gpt-4", "prompt": req.Prompt, + "messages": []map[string]string{ + { + "role": "user", + "content": req.Prompt, + }, + }, "max_tokens": req.MaxTokens, "temperature": req.Temperature, "top_p": req.TopP, + "grammar": req.ConstrainedGrammar, } jsonPayload, err := json.Marshal(payload) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } localAI := os.Getenv("LOCAL_AI_ENDPOINT") if localAI == "" { localAI = defaultLocalAI } - resp, err := http.Post(localAI+"/completions", "application/json", bytes.NewBuffer(jsonPayload)) + resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload)) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { body, err := io.ReadAll(resp.Body) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } http.Error(w, fmt.Sprintf("completion request failed with status code: %d: %s", resp.StatusCode, body), resp.StatusCode) return } var respBody struct { Result CompletionResponse `json:"result"` } if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } completion = respBody.Result } else { // Implement retrieval-augmented generation using retrieved data // ... } jsonResp, err := json.Marshal(completion) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } w.Header().Set("Content-Type", "application/json") w.WriteHeader(http.StatusOK) w.Write(jsonResp) } ``` This diff adds the `messages` field to the payload as required by the LocalAI API for constrained grammars and updates the endpoint to `/v1/chat/completions`. It also includes the `grammar` field when the `ConstrainedGrammar` is not empty.
Adds a Python script that converts a JSON schema into the grammar format from #1773. This allows generating JSON according to a schema, like Jsonformer or OpenAI's function calling.
Examples
Jsonformer Student Example
Jsonformer car example
OpenAI-style function calling