Server: add support for "tool_calls" (MeetKai/functionary model) #5695

ngxson · 2024-02-23T23:53:44Z

Support "tool_calls" OAI-compatible via MeetKai/functionary model

Motivation

Following my research on !5588 , I tried to implement ability to use https://github.com/MeetKai/functionary

The idea is that user can use the same OAI "tool_calls" included in /v1/chat/completions to interact with the model. There will be a translation layer to convert OAI schema <==> prompt.

Implementation

My implementation is self-contained inside functionary.hpp, with a simple functionary-test.cpp which allow me to test it without make server.

The current call stack looks like this (without tool_calls):

oaicompat_completion_params_parse
llama.request_completion
llama.queue_results.recv
format_final_response_oaicompat
res.set_content <== the final response is sent back to user

With tool_calls enabled:

oaicompat_completion_params_parse
🔴 convert_oai_to_prompt <== format the request into functionary's template
llama.request_completion
llama.queue_results.recv
format_final_response_oaicompat
🔴 convert_response_to_oai_choices <== extract tool calls and assistant message
res.set_content <== the final response is sent back to user

Upon loading the model, the template stored inside model is read, and if it's functionary's template, the tool_calls will be enable automatically. No additional config is required.

For the demo, see in the comment section

Testing

For now, I have no idea how to test it on CI. These changes are needed:

Compile functionary-test.cpp in CI
For E2E test, we need to find a way so that server (or only model part) responses with pre-defined data, since we don't want to download the whole 4GB model inside the CI.

ngxson · 2024-02-24T11:33:02Z

Demo

GGUF model is downloaded from this link: https://huggingface.co/meetkai/functionary-small-v2.2-GGUF/tree/main

I'm using functionary-small-v2.2.q4_0.gguf in this demo.

Turn 1: User asks and assistant wants to call a tool

Click to see

{
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the weather like in Paris and Lyon?"}
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "get the weather of a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {
                            "type": "string",
                            "description": "name of city to get weather"
                        }
                    }
                }
            }
        }
    ]
}

Response:

{
    "choices": [
        {
            "finish_reason": "tool_calls",
            "index": 0,
            "message": {
                "content": null,
                "role": "assistant",
                "tool_calls": [
                    {
                        "function": {
                            "arguments": "{\"city\": \"Paris\"}",
                            "name": "get_weather"
                        },
                        "id": "get_weather",
                        "type": "function"
                    },
                    {
                        "function": {
                            "arguments": "{\"city\": \"Lyon\"}",
                            "name": "get_weather"
                        },
                        "id": "get_weather",
                        "type": "function"
                    }
                ]
            }
        }
    ],
    "created": 1708774160,
    "id": "chatcmpl-sh6ACjjtwn4s3xZlHCmmrVwN5bgIMXZf",
    "model": "unknown",
    "object": "chat.completion",
    "usage": {
        "completion_tokens": 34,
        "prompt_tokens": 106,
        "total_tokens": 140
    }
}

Turn 2: Function is called and return data to assistant

Click to see

{
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the weather like in Paris and Lyon?"},
        {
            "role": "tool",
            "tool_call_id": "get_weather",
            "name": "get_weather",
            "content": "{\"weather\": \"Cloudy, 6°C\"}"
        },
        {
            "role": "tool",
            "tool_call_id": "get_weather",
            "name": "get_weather",
            "content": "{\"weather\": \"Sunny, 12°C\"}"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "get the weather of a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {
                            "type": "string",
                            "description": "name of city to get weather"
                        }
                    }
                }
            }
        }
    ]
}

Response:

{
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "The weather in Paris is cloudy with a temperature of 6°C, while the weather in Lyon is sunny with a temperature of 12°C.",
                "role": "assistant"
            }
        }
    ],
    "created": 1708774354,
    "id": "chatcmpl-QgvImRLqsP9ETs3LxJ0f1jkzeCC8B6Cn",
    "model": "unknown",
    "object": "chat.completion",
    "usage": {
        "completion_tokens": 39,
        "prompt_tokens": 155,
        "total_tokens": 194
    }
}

The final conversation should looks like this:

User: What is the weather like in Paris and Lyon?
Assistant: Let me call get_weather({city: "Paris"})
Assistant: Let me call get_weather({city: "Lyon"})
Function: Paris: Cloudy, 6°C
Function: Lyon: Sunny, 12°C
Assistant: The weather in Paris is cloudy with a temperature of 6°C, while the weather in Lyon is sunny with a temperature of 12°C.

ngxson · 2024-02-24T11:50:59Z

@ggerganov @phymbert Could you take a bit of time to give me some inputs regarding testing part? Thanks in advance!

ggerganov · 2024-02-24T12:16:45Z

This is an interesting application, but keep in mind I consider it low priority to merge in the short term as it adds even more functionality to server. I think server is in a bad state atm and needs significant rework to make the existing functionality stable (#4216). With yours, @phymbert's and other recent contributions, I'm much more optimistic that we'll be able to fix all issues and improve the implementation. Once this is done we can focus on adding more stuff such as this PR and also improving vision support

ngxson · 2024-02-24T13:01:24Z

Thanks for the info. As I'm not expecting it to be merged very soon, the my work is already been done in a self-contained manner to prevent having conflicts with future reworks on server side.

I'll keep this PR in draft state though, as some parts are still missing. Will come back to this when the server code become more stable.

545089467 · 2024-02-27T06:37:04Z

This is a super wonderful feature, exactly what I'm looking for! With no tool using, many other useful features would not be possible. Hoping the feature can merge soon!

phymbert · 2024-02-27T08:02:43Z

examples/server/oai.hpp

+    if (enable_tool_calls) {
+        choices = llama_functionary::convert_response_to_oai_choices(content);
+    } else {
+        choices = streaming


Is the streaming mode not supported for tools_call ?

No, it does not, because convert_response_to_oai_choices can only parse a full-constructed response

The code to throw an error with streaming mode is not yet implemented, but below I leave a // TODO: "enable_tool_calls" cannot be used with "stream" mode

phymbert · 2024-02-27T08:05:12Z

examples/server/functionary-test.cpp

+using json = nlohmann::json;
+
+/**
+ * A simple test program that allow testing functionary.hpp without using server.


As it is a simple unit test, better to go with the ctest approach as in the root repo.

Yeah you're right, it should be a ctest. Problem is that the ctest in root CMakeLists is for core library, but not for the examples.

I believe that I'll need to convert this file to ctest anyway, maybe the ctest will run along with behave, I'll see what's the best approach when I have time to continue working with this PR.

in the ci server workflow, we can plug a ctest target only for server. I will push you an example on a side branch

jmtatsch · 2024-03-16T09:39:22Z

More and more function calling capable models becoming available:
https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF
https://huggingface.co/cognitivecomputations/fc-dolphin-2.6-mistral-7b-dpo-laser
really looking forward to having this integrated in llama.cpp server

adrianliechti · 2024-03-17T17:47:21Z

Since Hermes 2 Pro has chosen OpenAI compatible schemas for their training, an implementation on top of llama.cpp was pretty easy:

convert or insert a system prompt with their template format (if there are any tools)
when reading the response from the model, convert any <tool_call> content and map to the openai api tool calls
convert incoming messages of role "tool" to their expected format <tool_response>

https://github.com/adrianliechti/llama/blob/main/pkg/adapter/hermesfn/adapter.go

vladfaust · 2024-04-14T03:15:51Z

I understand this might not be the most appropriate forum for my observation, but while researching llama.cpp, I noticed that the server component of this repository seems to receive a lot of attention. It appears to me that this might be breaking the principle of isolation of concerns. Would @ggerganov consider extracting the server component to a separate repository? If this issue has already been raised, could you please direct me to it? Thank you.

sdesrozis · 2025-02-01T11:26:55Z

Hi,
Any progress on this feature ?
It would be a great improvment since tool calling is very popular.

jmtatsch · 2025-02-01T11:42:45Z

As you see one reference above function calling has been merged via another pull request

wip: add support for functionary

aeed190

ngxson linked an issue Feb 23, 2024 that may be closed by this pull request

Server: add function calling API #5588

Closed

3 tasks

ngxson added 2 commits February 24, 2024 00:57

fix build

2a0d74d

first working version

aea8177

phymbert reviewed Feb 27, 2024

View reviewed changes

ngxson mentioned this pull request Feb 27, 2024

server : improvements and maintenance #4216

Open

10 tasks

ngxson added the demo Demonstrate some concept or idea, not intended to be merged label Mar 16, 2024

ochafik mentioned this pull request Mar 29, 2024

[WIP] agent example (w/ sandboxable Tools!) & improved OAI compatibility layer (in Python) #6389

Closed

15 tasks

phymbert added the server/webui label Apr 8, 2024

skoulik mentioned this pull request Apr 30, 2024

Server: add function calling API #5588

Closed

3 tasks

cebtenzzre mentioned this pull request May 5, 2024

ChatGPT Plugin Functionality nomic-ai/gpt4all#1417

Closed

5 tasks

qnixsynapse mentioned this pull request Jul 24, 2024

Feature Request: Proper Llama 3.1 Support in llama.cpp #8650

Closed

4 tasks

ngxson mentioned this pull request Aug 16, 2024

Feature Request: introduce Tool Call API in server mode #9031

Closed

4 tasks

ngxson mentioned this pull request Aug 30, 2024

server : add Hermes-3 tool call support (WIP) #9254

Draft

12 tasks

ochafik mentioned this pull request Sep 25, 2024

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars #9639

Merged

41 tasks

sdesrozis mentioned this pull request Feb 1, 2025

Misc. bug: llama-server throws "Unsupported param: tools" #10920

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server: add support for "tool_calls" (MeetKai/functionary model) #5695

Server: add support for "tool_calls" (MeetKai/functionary model) #5695

ngxson commented Feb 23, 2024 •

edited

Loading

ngxson commented Feb 24, 2024 •

edited

Loading

ngxson commented Feb 24, 2024

ggerganov commented Feb 24, 2024

ngxson commented Feb 24, 2024 •

edited

Loading

545089467 commented Feb 27, 2024

phymbert Feb 27, 2024

ngxson Feb 27, 2024

ngxson Feb 27, 2024

phymbert Feb 27, 2024 •

edited

Loading

ngxson Feb 27, 2024 •

edited

Loading

phymbert Feb 27, 2024

jmtatsch commented Mar 16, 2024

adrianliechti commented Mar 17, 2024

vladfaust commented Apr 14, 2024

sdesrozis commented Feb 1, 2025

jmtatsch commented Feb 1, 2025

Server: add support for "tool_calls" (MeetKai/functionary model) #5695

Are you sure you want to change the base?

Server: add support for "tool_calls" (MeetKai/functionary model) #5695

Conversation

ngxson commented Feb 23, 2024 • edited Loading

Support "tool_calls" OAI-compatible via MeetKai/functionary model

Motivation

Implementation

Testing

ngxson commented Feb 24, 2024 • edited Loading

Demo

Turn 1: User asks and assistant wants to call a tool

Turn 2: Function is called and return data to assistant

ngxson commented Feb 24, 2024

ggerganov commented Feb 24, 2024

ngxson commented Feb 24, 2024 • edited Loading

545089467 commented Feb 27, 2024

phymbert Feb 27, 2024

Choose a reason for hiding this comment

ngxson Feb 27, 2024

Choose a reason for hiding this comment

ngxson Feb 27, 2024

Choose a reason for hiding this comment

phymbert Feb 27, 2024 • edited Loading

Choose a reason for hiding this comment

ngxson Feb 27, 2024 • edited Loading

Choose a reason for hiding this comment

phymbert Feb 27, 2024

Choose a reason for hiding this comment

jmtatsch commented Mar 16, 2024

adrianliechti commented Mar 17, 2024

vladfaust commented Apr 14, 2024

sdesrozis commented Feb 1, 2025

jmtatsch commented Feb 1, 2025

ngxson commented Feb 23, 2024 •

edited

Loading

ngxson commented Feb 24, 2024 •

edited

Loading

ngxson commented Feb 24, 2024 •

edited

Loading

phymbert Feb 27, 2024 •

edited

Loading

ngxson Feb 27, 2024 •

edited

Loading