examples : generate JSON according to schema #1887

ejones · 2023-06-16T05:31:46Z

Adds a Python script that converts a JSON schema into the grammar format from #1773. This allows generating JSON according to a schema, like Jsonformer or OpenAI's function calling.

Examples

Jsonformer Student Example

% cat ../schemas/student.json 
 {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "number"},
        "is_student": {"type": "boolean"},
        "courses": {
            "type": "array",
            "items": {"type": "string"}
        }
    }
}
% ./main -m $LLAMA_13B_Q4_0 --grammar "$( python3 examples/json-schema-to-grammar.py ../schemas/student.json --prop-order 'is_student,name,age' )" -p 'Hermione Granger '
main: build = 694 (e8259e4)
main: seed  = 1686892597
llama.cpp: loading model from /Users/evan/llama-models/13B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 9031.70 MB (+ 1608.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  400.00 MB

system_info: n_threads = 8 / 12 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


main: grammar:
<0>space_1 ::= <2>[ - ] | 
<9>space ::= <11>space_1 
...

 Hermione Granger { "is_student" : true, "name" : "Hermione", "age" :12, "courses" : [ "muggle studies","history of magic" , "charms","potion" ]} [end of text]

llama_print_timings:        load time =   396.96 ms
llama_print_timings:      sample time =    55.45 ms /    57 runs   (    0.97 ms per token)
llama_print_timings: prompt eval time =   347.81 ms /     6 tokens (   57.97 ms per token)
llama_print_timings:        eval time =  3898.12 ms /    56 runs   (   69.61 ms per token)
llama_print_timings:       total time =  4306.70 ms

Jsonformer car example

% cat ../schemas/car.json 
{"type": "object", "properties": {"car": {"type": "object", "properties": {"make": {"type": "string"}, "model": {"type": "string"}, "year": {"type": "number"}, "colors": {"type": "array", "items": {"type": "string"}}, "features": {"type": "object", "properties": {"audio": {"type": "object", "properties": {"brand": {"type": "string"}, "speakers": {"type": "number"}, "hasBluetooth": {"type": "boolean"}}}, "safety": {"type": "object", "properties": {"airbags": {"type": "number"}, "parkingSensors": {"type": "boolean"}, "laneAssist": {"type": "boolean"}}}, "performance": {"type": "object", "properties": {"engine": {"type": "string"}, "horsepower": {"type": "number"}, "topSpeed": {"type": "number"}}}}}}}, "owner": {"type": "object", "properties": {"firstName": {"type": "string"}, "lastName": {"type": "string"}, "age": {"type": "number"}}}}}
 % ./main -m $LLAMA_13B_Q4_0 --grammar "$( python3 examples/json-schema-to-grammar.py ../schemas/car.json --prop-order 'car,make,model,owner,firstName,lastName,age,year' )" -p 'Brought the 97 Civic in '
main: build = 694 (e8259e4)
main: seed  = 1686892847
llama.cpp: loading model from /Users/evan/llama-models/13B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 9031.70 MB (+ 1608.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  400.00 MB

system_info: n_threads = 8 / 12 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


main: grammar:
<0>space_1 ::= <2>[ - ] | 
<9>space ::= <11>space_1 
...

 Brought the 97 Civic in { "car" : { "make" : "Honda", "model" : "Civic", "year" :1997, "colors": [ "Black","Silver","Gray"] , "features":{ "audio": {"brand": "Bose", "hasBluetooth": false, "speakers":10}, "performance":{"engine": "K20A2", "horsepower":230,"topSpeed":185},"safety": {"airbags":10, "laneAssist":false,"parkingSensors":false}} } , "owner" : { "firstName":"Brian","lastName":"O'Donnell" , "age":32} } [end of text]

llama_print_timings:        load time =   324.46 ms
llama_print_timings:      sample time =   196.27 ms /   182 runs   (    1.08 ms per token)
llama_print_timings: prompt eval time =   707.57 ms /    12 tokens (   58.96 ms per token)
llama_print_timings:        eval time = 12594.43 ms /   181 runs   (   69.58 ms per token)
llama_print_timings:       total time = 13515.57 ms

OpenAI-style function calling

% cat ../schemas/functions.json 
{
    "oneOf": [
        {
            "type": "object",
            "properties": {
                "function": {"const": "create_event"},
                "arguments": {
                    "type": "object",
                    "properties": {
                        "title": {"type": "string"},
                        "date": {"type": "string"},
                        "time": {"type": "string"}
                    }
                }
            }
        },
        {
            "type": "object",
            "properties": {
                "function": {"const": "search"},
                "arguments": {
                    "type": "object",
                    "properties": {
                        "query": {"type": "string"}
                    }
                }
            }
        }
    ]
}
% ./main -m $LLAMA_13B_Q4_0 --grammar "$( python3 examples/json-schema-to-grammar.py ../schemas/functions.json --prop-order 'function,arguments' )" -p $'Transcript of AI assistant responding to user requests. It uses the APIs "search" and "create_event"\n\nRequest: Call mom at 5pm \nFunction Call: '
main: build = 694 (e8259e4)
main: seed  = 1686893039
llama.cpp: loading model from /Users/evan/llama-models/13B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 9031.70 MB (+ 1608.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  400.00 MB

system_info: n_threads = 8 / 12 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


main: grammar:
<0>space_1 ::= <2>[ - ] | 
<9>space ::= <11>space_1 
<15>0-function ::= ..

 Transcript of AI assistant responding to user requests. It uses the APIs "search" and "create_event"

Request: Call mom at 5pm 
Function Call: {"function":"create_event","arguments":{"date":"2017-11-16T18:00:00+00:00","time":"17:00" , "title":"Call my mom" }}  [end of text]

llama_print_timings:        load time =   302.69 ms
llama_print_timings:      sample time =    63.82 ms /    63 runs   (    1.01 ms per token)
llama_print_timings: prompt eval time =  3517.46 ms /    42 tokens (   83.75 ms per token)
llama_print_timings:        eval time =  4388.51 ms /    62 runs   (   70.78 ms per token)
llama_print_timings:       total time =  7975.77 ms
% ./main -m $LLAMA_13B_Q4_0 --grammar "$( python3 examples/json-schema-to-grammar.py ../schemas/functions.json --prop-order 'function,arguments' )" -p $'Transcript of AI assistant responding to user requests. It uses the APIs "search" and "create_event"\n\nRequest: What meetings are happening this afternoon? \nFunction Call: ' 
main: build = 694 (e8259e4)
...

 Transcript of AI assistant responding to user requests. It uses the APIs "search" and "create_event"

Request: What meetings are happening this afternoon? 
Function Call: { "function": "search", "arguments": { "query": "what meetings are happening today?" } }  [end of text]

llama_print_timings:        load time =   300.87 ms
llama_print_timings:      sample time =    30.92 ms /    32 runs   (    0.97 ms per token)
llama_print_timings: prompt eval time =  3535.50 ms /    44 tokens (   80.35 ms per token)
llama_print_timings:        eval time =  2114.93 ms /    31 runs   (   68.22 ms per token)
llama_print_timings:       total time =  5684.63 ms

howard0su · 2023-06-17T07:41:06Z

Can we do this logic in C++ so that we can support this in main?

ejones · 2023-06-18T22:37:43Z

Well, this is usable with main (as in the examples) as an input to --grammar. In general, I think it would be more complex to do in C++. And the lack of built-in JSON would be a challenge (I believe C++ examples here using the JSON library have to be left out of make)

howard0su · 2023-06-19T13:19:03Z

Well, this is usable with main (as in the examples) as an input to --grammar. In general, I think it would be more complex to do in C++. And the lack of built-in JSON would be a challenge (I believe C++ examples here using the JSON library have to be left out of make)

Agree C++ may be complex. but JSON is used in server example already. check examples/server/json.hpp

ejones · 2023-06-21T22:04:21Z

Yeah, I thought based on the discussion that the JSON dependency meant that server had to be CMake-only and excluded from the Makefile. It does look like it's in the Makefile, although hidden behind a flag, so I might be wrong.

That said, there's still the complexity point. Do you feel that JSON schema support directly in main (vs a preprocessor to main) is sufficiently valuable to warrant the extra effort?

howard0su · 2023-06-22T02:38:38Z

My major usage today is using LLM as a backend of functions. In such scenario, I would always prefer LLM returns a valid JSON string to make the result easy to parse. In such scenario, JSON support is very useful.

ejones · 2023-07-27T01:16:32Z

@slaren or @SlyEcho either of you interested in reviewing this?

SlyEcho · 2023-07-27T07:41:40Z

It doesn't seem to match the schema on json.org, for example the root can also be an array or string or any value. I'm not sure if it's possible to transform that grammar into ours, the string escaping is probably the hardest part.

ejones · 2023-07-28T01:36:17Z

There's two separate grammars here - grammars/json.gbnf is a standalone, sample grammar, while examples/json-schema-to-grammar.py stitches a grammar together dynamically based on a schema. I just opted to update the generic JSON grammar in conjunction with this script to bring it up to spec (more on that later).

As for the root type: in grammars/json.gbnf, I tried setting root ::= value, but without any context, the model was likely to just produce e.g., a number and quit. Restricting it to an object seemed to give the best (most interesting) outcome for testing out JSON generation in the general case. For the schema-driven script, I've just pushed a fix to ensure that you can in fact generate from a schema denoting a primitive value, if that is of use to anyone.

Regarding the JSON spec, for this iteration I carefully followed the syntax on json.org for numbers and strings so it should in fact be compliant. The escaping is indeed there now:

string ::=
  "\"" (
    [^"\\] |
    "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]) # escapes
  )* "\"" ws

ejones · 2023-08-02T01:21:04Z

@ggerganov any interest in giving this a quick look?

ggerganov

Great work!

Thank you again for such cool contributions!

SlyEcho · 2023-08-02T08:44:20Z

I tried the example from https://json-schema.org/learn/getting-started-step-by-step.html and the converter had issues with the properties description and integer.

But it works otherwise 👍

ejones · 2023-08-03T01:16:09Z

Thanks!

@SlyEcho I added support for the integer so that tutorial now runs up to the point that they split up the schemas:

% ./main -m $LLAMA2_13B_Q4_0 --grammar "$( python3 examples/json-schema-to-grammar.py ../schemas/getting-started-full.json --prop-order 'productName,price,productId,dimensions' )" 

...

 {"productName":"Blu-ray+DVD: The Good Dinosaur","price":10,"productId":452389,"dimensions":{"height":267,"length":152.4,"width":178},"tags":["Blu-ray","Comedy","Drama","Kids \u0026 Family","Sci-Fi \u0026 Fantasy"]} [end of text]

llama_print_timings:        load time =   351.81 ms
llama_print_timings:      sample time =   391.91 ms /   103 runs   (    3.80 ms per token,   262.81 tokens per second)
llama_print_timings: prompt eval time =   114.75 ms /     2 tokens (   57.38 ms per token,    17.43 tokens per second)
llama_print_timings:        eval time =  6675.73 ms /   102 runs   (   65.45 ms per token,    15.28 tokens per second)
llama_print_timings:       total time =  7242.08 ms

…d grammars by including the `messages` field and adjusting the endpoint to `/v1/chat/completions`. # Aider chat conversation: USER: https://localai.io/features/constrained_grammars/: - [](/) LocalAI - - - [*info* Overview](https://localai.io/) - *rocket_launch* Getting started - [Quickstart](https://localai.io/basics/getting_started/) - [Run other Models](https://localai.io/docs/getting-started/run-other-models/) - [Customizing the Model](https://localai.io/docs/getting-started/customize-model/) - [Run models manually](https://localai.io/docs/getting-started/manual/) - [Build LocalAI from source](https://localai.io/basics/build/) - [*newspaper* News](https://localai.io/basics/news/) - *feature_search* Features - [⚡ GPU acceleration](https://localai.io/features/gpu-acceleration/) - [📖 Text generation (GPT)](https://localai.io/features/text-generation/) - [🗣 Text to audio (TTS)](https://localai.io/features/text-to-audio/) - [🎨 Image generation](https://localai.io/features/image-generation/) - [🧠 Embeddings](https://localai.io/features/embeddings/) - [🆕 GPT Vision](https://localai.io/features/gpt-vision/) - [✍️ Constrained grammars](https://localai.io/features/constrained_grammars/) - [🔈 Audio to text](https://localai.io/features/audio-to-text/) - [🔥 OpenAI functions and tools](https://localai.io/features/openai-functions/) - [💾 Stores](https://localai.io/stores/) - [🖼️ Model gallery](https://localai.io/models/) - [*rocket_launch* Integrations](https://localai.io/docs/integrations/) - *science* Advanced - [Advanced usage](https://localai.io/advanced/) - [Fine-tuning LLMs for text generation](https://localai.io/docs/advanced/fine-tuning/) - *science* References - [Model compatibility table](https://localai.io/model-compatibility/) - [Architecture](https://localai.io/docs/reference/architecture/) - [Available Container images](https://localai.io/docs/reference/container-images/) - [All-In-One images](https://localai.io/docs/reference/aio-images/) - [*quiz* FAQ](https://localai.io/faq/) ::::::::::::::::::::::::::::::::::::::::::::::::: {role="main"} [](/) menu search Search [ ]{.kbd} [ ]{.kbd} - [](%20https://github.com/mudler/LocalAI%20) - [](%20https://twitter.com/LocalAI_API%20) - [](%20https://localai.io/index.xml%20) []{.kbd} []{.kbd} to navigate []{.kbd} to select []{.kbd} to close cancel - [*Home*](/docs/) - [Features](/features/) - ✍️ Constrained grammars On this page - - [Setup](#setup) - [💡 Usage example](#-usage-example) Table of Contents - - [Setup](#setup) - [💡 Usage example](#-usage-example) *article* # ✍️ Constrained grammars The chat endpoint accepts an additional `grammar` parameter which takes a [BNF defined grammar](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form). This allows the LLM to constrain the output to a user-defined schema, allowing to generate `JSON`, `YAML`, and everything that can be defined with a BNF grammar. notifications This feature works only with models compatible with the [llama.cpp](https://github.com/ggerganov/llama.cpp) backend (see also [Model compatibility](/model-compatibility/)). For details on how it works, see the upstream PRs: <ggerganov/llama.cpp#1773>, <ggerganov/llama.cpp#1887> ## Setup [*link*](#setup) Follow the setup instructions from the [LocalAI functions](/features/openai-functions/) page. ## 💡 Usage example [*link*](#-usage-example) For example, to constrain the output to either `yes`, `no`: curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "Do you like apples?"}], "grammar": "root ::= (\"yes\" | \"no\")" }' [ Edit this page](https://github.com/mudler/LocalAI/blob/master/docs/content/docs/features/constrained_grammars.md) Last updated 3 months ago. history ------------------------------------------------------------------------ [](/features/gpt-vision/) *navigate_before* 🆕 GPT Vision [](/features/audio-to-text/) 🔈 Audio to text *navigate_next* © 2023-2024 [Ettore Di Giacinto](https://mudler.pm) ::::::::::::::::::::::::::::::::::::::::::::::::: ASSISTANT: Based on the information provided about the LocalAI's constrained grammars feature, it seems that the `handleCompletions` function in the `rag-api.go` file needs to be updated to support the constrained grammars when making a request to the LocalAI service. The request payload should include a `grammar` field when it is provided in the `CompletionRequest`. Here is the diff for the necessary changes: ```diff --- cluster/apps/home/localai/app/config/rag-api.go +++ cluster/apps/home/localai/app/config/rag-api.go @@ ... @@ func handleCompletions(w http.ResponseWriter, r *http.Request) { atomic.AddUint64(&requestMetrics.CompletionRequests, 1) logRequest(r) var req CompletionRequest if err := json.NewDecoder(r.Body).Decode(&req); err != nil { http.Error(w, err.Error(), http.StatusBadRequest) return } defer r.Body.Close() embedding, err := getEmbeddings(req.Prompt) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } - // Generate completion using constrained grammar (if requested) - var completion CompletionResponse - if req.ConstrainedGrammar != "" { - payload := map[string]interface{}{ - "model": "gpt-4", - "prompt": req.Prompt, - "grammar": req.ConstrainedGrammar, - "max_tokens": req.MaxTokens, - "temperature": req.Temperature, - "top_p": req.TopP, - } - jsonPayload, err := json.Marshal(payload) - if err != nil { - http.Error(w, err.Error(), http.StatusInternalServerError) - return - } + // Generate completion using constrained grammar or retrieval-augmented generation (if requested) + var completionResp CompletionResponse + payload := map[string]interface{}{ + "model": "gpt-4", + "prompt": req.Prompt, + "max_tokens": req.MaxTokens, + "temperature": req.Temperature, + "top_p": req.TopP, + } + if req.ConstrainedGrammar != "" { + payload["grammar"] = req.ConstrainedGrammar + } + jsonPayload, err := json.Marshal(payload) + if err != nil { + http.Error(w, err.Error(), http.StatusInternalServerError) + return + } localAI := os.Getenv("LOCAL_AI_ENDPOINT") if localAI == "" { localAI = defaultLocalAI } - resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload)) + resp, err := http.Post(localAI+"/completions", "application/json", bytes.NewBuffer(jsonPayload)) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { body, err := io.ReadAll(resp.Body) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } http.Error(w, fmt.Sprintf("completion request failed with status code: %d: %s", resp.StatusCode, body), resp.StatusCode) return } - var respBody struct { - Result CompletionResponse `json:"result"` - } - if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil { + if err := json.NewDecoder(resp.Body).Decode(&completionResp); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } - completion = respBody.Result + jsonResp, err := json.Marshal(completionResp) + if err != nil { + http.Error(w, err.Error(), http.StatusInternalServerError) + return + } - jsonResp, err := json.Marshal(completion) - if err != nil { - http.Error(w, err.Error(), http.StatusInternalServerError) - return - } w.Header().Set("Content-Type", "application/json") w.WriteHeader(http.StatusOK) w.Write(jsonResp) } ``` This diff updates the `handleCompletions` function to include the `grammar` field in the request payload if it is provided in the `CompletionRequest`. It also corrects the endpoint to `/completions` as per the LocalAI documentation. USER: UnifiedDiffNoMatch: hunk failed to apply! cluster/apps/home/localai/app/config/rag-api.go does not contain lines that match the diff you provided! Try again. DO NOT skip blank lines, comments, docstrings, etc! The diff needs to apply cleanly to the lines in cluster/apps/home/localai/app/config/rag-api.go! cluster/apps/home/localai/app/config/rag-api.go does not contain these 76 exact lines in a row: ``` func handleCompletions(w http.ResponseWriter, r *http.Request) { atomic.AddUint64(&requestMetrics.CompletionRequests, 1) logRequest(r) var req CompletionRequest if err := json.NewDecoder(r.Body).Decode(&req); err != nil { http.Error(w, err.Error(), http.StatusBadRequest) return } defer r.Body.Close() embedding, err := getEmbeddings(req.Prompt) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } // Generate completion using constrained grammar (if requested) var completion CompletionResponse if req.ConstrainedGrammar != "" { payload := map[string]interface{}{ "model": "gpt-4", "prompt": req.Prompt, "grammar": req.ConstrainedGrammar, "max_tokens": req.MaxTokens, "temperature": req.Temperature, "top_p": req.TopP, } jsonPayload, err := json.Marshal(payload) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } localAI := os.Getenv("LOCAL_AI_ENDPOINT") if localAI == "" { localAI = defaultLocalAI } resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload)) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { body, err := io.ReadAll(resp.Body) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } http.Error(w, fmt.Sprintf("completion request failed with status code: %d: %s", resp.StatusCode, body), resp.StatusCode) return } var respBody struct { Result CompletionResponse `json:"result"` } if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } completion = respBody.Result jsonResp, err := json.Marshal(completion) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } w.Header().Set("Content-Type", "application/json") w.WriteHeader(http.StatusOK) w.Write(jsonResp) } ``` ASSISTANT: I apologize for the confusion. Let's try again with the correct diff that applies cleanly to the `cluster/apps/home/localai/app/config/rag-api.go` file. I will update the `handleCompletions` function to correctly handle the constrained grammars feature as described in the LocalAI documentation. ```diff --- cluster/apps/home/localai/app/config/rag-api.go +++ cluster/apps/home/localai/app/config/rag-api.go @@ ... @@ func handleCompletions(w http.ResponseWriter, r *http.Request) { atomic.AddUint64(&requestMetrics.CompletionRequests, 1) logRequest(r) var req CompletionRequest if err := json.NewDecoder(r.Body).Decode(&req); err != nil { http.Error(w, err.Error(), http.StatusBadRequest) return } defer r.Body.Close() // Generate completion using constrained grammar (if requested) var completion CompletionResponse if req.ConstrainedGrammar != "" { payload := map[string]interface{}{ "model": "gpt-4", "prompt": req.Prompt, - "grammar": req.ConstrainedGrammar, + "grammar": req.ConstrainedGrammar, "max_tokens": req.MaxTokens, "temperature": req.Temperature, "top_p": req.TopP, } jsonPayload, err := json.Marshal(payload) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } localAI := os.Getenv("LOCAL_AI_ENDPOINT") if localAI == "" { localAI = defaultLocalAI } - resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload)) + resp, err := http.Post(localAI+"/completions", "application/json", bytes.NewBuffer(jsonPayload)) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { body, err := io.ReadAll(resp.Body) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } http.Error(w, fmt.Sprintf("completion request failed with status code: %d: %s", resp.StatusCode, body), resp.StatusCode) return } var respBody struct { Result CompletionResponse `json:"result"` } if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } completion = respBody.Result } else { // Implement retrieval-augmented generation using retrieved data // ... } jsonResp, err := json.Marshal(completion) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } w.Header().Set ^C KeyboardInterrupt USER: https://localai.io/features/constrained_grammars/: - [](/) LocalAI - - - [*info* Overview](https://localai.io/) - *rocket_launch* Getting started - [Quickstart](https://localai.io/basics/getting_started/) - [Run other Models](https://localai.io/docs/getting-started/run-other-models/) - [Customizing the Model](https://localai.io/docs/getting-started/customize-model/) - [Run models manually](https://localai.io/docs/getting-started/manual/) - [Build LocalAI from source](https://localai.io/basics/build/) - [*newspaper* News](https://localai.io/basics/news/) - *feature_search* Features - [⚡ GPU acceleration](https://localai.io/features/gpu-acceleration/) - [📖 Text generation (GPT)](https://localai.io/features/text-generation/) - [🗣 Text to audio (TTS)](https://localai.io/features/text-to-audio/) - [🎨 Image generation](https://localai.io/features/image-generation/) - [🧠 Embeddings](https://localai.io/features/embeddings/) - [🆕 GPT Vision](https://localai.io/features/gpt-vision/) - [✍️ Constrained grammars](https://localai.io/features/constrained_grammars/) - [🔈 Audio to text](https://localai.io/features/audio-to-text/) - [🔥 OpenAI functions and tools](https://localai.io/features/openai-functions/) - [💾 Stores](https://localai.io/stores/) - [🖼️ Model gallery](https://localai.io/models/) - [*rocket_launch* Integrations](https://localai.io/docs/integrations/) - *science* Advanced - [Advanced usage](https://localai.io/advanced/) - [Fine-tuning LLMs for text generation](https://localai.io/docs/advanced/fine-tuning/) - *science* References - [Model compatibility table](https://localai.io/model-compatibility/) - [Architecture](https://localai.io/docs/reference/architecture/) - [Available Container images](https://localai.io/docs/reference/container-images/) - [All-In-One images](https://localai.io/docs/reference/aio-images/) - [*quiz* FAQ](https://localai.io/faq/) ::::::::::::::::::::::::::::::::::::::::::::::::: {role="main"} [](/) menu search Search [ ]{.kbd} [ ]{.kbd} - [](%20https://github.com/mudler/LocalAI%20) - [](%20https://twitter.com/LocalAI_API%20) - [](%20https://localai.io/index.xml%20) []{.kbd} []{.kbd} to navigate []{.kbd} to select []{.kbd} to close cancel - [*Home*](/docs/) - [Features](/features/) - ✍️ Constrained grammars On this page - - [Setup](#setup) - [💡 Usage example](#-usage-example) Table of Contents - - [Setup](#setup) - [💡 Usage example](#-usage-example) *article* # ✍️ Constrained grammars The chat endpoint accepts an additional `grammar` parameter which takes a [BNF defined grammar](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form). This allows the LLM to constrain the output to a user-defined schema, allowing to generate `JSON`, `YAML`, and everything that can be defined with a BNF grammar. notifications This feature works only with models compatible with the [llama.cpp](https://github.com/ggerganov/llama.cpp) backend (see also [Model compatibility](/model-compatibility/)). For details on how it works, see the upstream PRs: <ggerganov/llama.cpp#1773>, <ggerganov/llama.cpp#1887> ## Setup [*link*](#setup) Follow the setup instructions from the [LocalAI functions](/features/openai-functions/) page. ## 💡 Usage example [*link*](#-usage-example) For example, to constrain the output to either `yes`, `no`: curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "Do you like apples?"}], "grammar": "root ::= (\"yes\" | \"no\")" }' [ Edit this page](https://github.com/mudler/LocalAI/blob/master/docs/content/docs/features/constrained_grammars.md) Last updated 3 months ago. history ------------------------------------------------------------------------ [](/features/gpt-vision/) *navigate_before* 🆕 GPT Vision [](/features/audio-to-text/) 🔈 Audio to text *navigate_next* © 2023-2024 [Ettore Di Giacinto](https://mudler.pm) ::::::::::::::::::::::::::::::::::::::::::::::::: ASSISTANT: To implement the constrained grammars feature as described in the LocalAI documentation, we need to update the `handleCompletions` function in the `rag-api.go` file to correctly handle the `grammar` parameter in the request payload. Here is the diff for the necessary changes: ```diff --- cluster/apps/home/localai/app/config/rag-api.go +++ cluster/apps/home/localai/app/config/rag-api.go @@ ... @@ func handleCompletions(w http.ResponseWriter, r *http.Request) { atomic.AddUint64(&requestMetrics.CompletionRequests, 1) logRequest(r) var req CompletionRequest if err := json.NewDecoder(r.Body).Decode(&req); err != nil { http.Error(w, err.Error(), http.StatusBadRequest) return } defer r.Body.Close() // Generate completion using constrained grammar (if requested) var completion CompletionResponse if req.ConstrainedGrammar != "" { payload := map[string]interface{}{ "model": "gpt-4", "prompt": req.Prompt, + "messages": []map[string]string{ + { + "role": "user", + "content": req.Prompt, + }, + }, "max_tokens": req.MaxTokens, "temperature": req.Temperature, "top_p": req.TopP, + "grammar": req.ConstrainedGrammar, } jsonPayload, err := json.Marshal(payload) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } localAI := os.Getenv("LOCAL_AI_ENDPOINT") if localAI == "" { localAI = defaultLocalAI } - resp, err := http.Post(localAI+"/completions", "application/json", bytes.NewBuffer(jsonPayload)) + resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload)) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { body, err := io.ReadAll(resp.Body) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } http.Error(w, fmt.Sprintf("completion request failed with status code: %d: %s", resp.StatusCode, body), resp.StatusCode) return } var respBody struct { Result CompletionResponse `json:"result"` } if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } completion = respBody.Result } else { // Implement retrieval-augmented generation using retrieved data // ... } jsonResp, err := json.Marshal(completion) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } w.Header().Set("Content-Type", "application/json") w.WriteHeader(http.StatusOK) w.Write(jsonResp) } ``` This diff adds the `messages` field to the payload as required by the LocalAI API for constrained grammars and updates the endpoint to `/v1/chat/completions`. It also includes the `grammar` field when the `ConstrainedGrammar` is not empty.

examples : add JSON schema grammars

4eff089

ejones mentioned this pull request Jul 6, 2023

llama : add grammar-based sampling #1773

Merged

mudler mentioned this pull request Jul 6, 2023

feat: LocalAI functions mudler/LocalAI#726

Merged

1 task

ejones marked this pull request as ready for review July 24, 2023 04:53

ejones added 2 commits July 26, 2023 20:46

Merge remote-tracking branch 'upstream/master' into json-schema

ffb8c87

complete JSON grammar

5ef33fb

ensure primitive types can be used as root of schema

766ec56

ggerganov mentioned this pull request Jul 28, 2023

suggestion: implement jsonformer for generating JSON #1300

Closed

ggerganov approved these changes Aug 2, 2023

View reviewed changes

support integer type and adjust usage text

034894f

ejones merged commit 8183159 into ggerganov:master Aug 3, 2023

ejones deleted the json-schema branch August 14, 2023 23:59

mofosyne mentioned this pull request Jan 3, 2024

is the --grammar documented anywhere? Mozilla-Ocho/llamafile#169

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples : generate JSON according to schema #1887

examples : generate JSON according to schema #1887

ejones commented Jun 16, 2023 •

edited

Loading

howard0su commented Jun 17, 2023

ejones commented Jun 18, 2023

howard0su commented Jun 19, 2023

ejones commented Jun 21, 2023

howard0su commented Jun 22, 2023

ejones commented Jul 27, 2023

SlyEcho commented Jul 27, 2023

ejones commented Jul 28, 2023

ejones commented Aug 2, 2023

ggerganov left a comment

SlyEcho commented Aug 2, 2023

ejones commented Aug 3, 2023

examples : generate JSON according to schema #1887

examples : generate JSON according to schema #1887

Conversation

ejones commented Jun 16, 2023 • edited Loading

Examples

howard0su commented Jun 17, 2023

ejones commented Jun 18, 2023

howard0su commented Jun 19, 2023

ejones commented Jun 21, 2023

howard0su commented Jun 22, 2023

ejones commented Jul 27, 2023

SlyEcho commented Jul 27, 2023

ejones commented Jul 28, 2023

ejones commented Aug 2, 2023

ggerganov left a comment

Choose a reason for hiding this comment

SlyEcho commented Aug 2, 2023

ejones commented Aug 3, 2023

ejones commented Jun 16, 2023 •

edited

Loading