Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured outputs as an alternative to Tool Calling #582

Open
lazyhope opened this issue Jan 1, 2025 · 11 comments
Open

Structured outputs as an alternative to Tool Calling #582

lazyhope opened this issue Jan 1, 2025 · 11 comments
Labels
enhancement New feature or request

Comments

@lazyhope
Copy link

lazyhope commented Jan 1, 2025

Issue Description:

Currently, pydantic-ai implements structured output solely using tool-calling APIs from model providers. While this works in most cases, certain schemas supported by pydantic exhibit inconsistencies between model providers.

For instance, the following schema from the documentation does not work with Gemini models:

class UserProfile(TypedDict, total=False):
    name: str
    dob: date
    bio: str

agent = Agent(
    'gemini-2.0-flash-exp',
    result_type=UserProfile,
)
agent.run_sync("Generate a synthetic data")

This results in the following error:

UnexpectedModelBehavior: Unexpected response from gemini 400, body:
{
  "error": {
    "code": 400,
    "message": "* GenerateContentRequest.tools[0].function_declarations[0].parameters.properties[dob].format: only 'enum' is supported for STRING type\n",
    "status": "INVALID_ARGUMENT"
  }
}

In this example, the inconsistency stems from the model provider's limitations. However, based on my observations working with tools like instructor, modern LLMs are increasingly proficient at adhering to JSON-format prompts in their text responses. In fact, they often perform better in terms of json content in standard completion modes than in tool-calling modes. The Berkeley Function-Calling Leaderboard may provide further evidence of this trend.

Feature Request

Would it be possible for pydantic-ai to implement an alternative mode akin to instructor's MD_JSON mode? This mode could use prompt engineering to guide the LLM’s output and parse the resulting JSON as raw text rather than relying on tool-calling APIs.

Such a feature would:

  • Allow broader compatibility with any model capable of following JSON schema prompts.
  • Address model-specific inconsistencies while leveraging pydantic's full schema flexibility.

Thank you for considering this suggestion!

@lazyhope lazyhope changed the title Add Support for Prompt-Based JSON Parsing Mode as an Alternative to Tool Calling [Feature] Add Support for Prompt-Based JSON Parsing Mode as an Alternative to Tool Calling Jan 1, 2025
@lazyhope
Copy link
Author

lazyhope commented Jan 1, 2025

FYI, here’s an example where instructor’s Markdown JSON mode works seamlessly with the UserProfile schema:

import instructor
from datetime import date
from typing import TypedDict
from litellm import completion

class UserProfile(TypedDict, total=False):
    name: str
    dob: date
    bio: str

client = instructor.from_litellm(completion, mode=instructor.Mode.MD_JSON)  # Switching to `instructor.Mode.TOOLS` would result in the same error mentioned earlier
user = client.chat.completions.create(
    model="gemini/gemini-2.0-flash-exp",
    messages=[
        {"role": "user", "content": "Generate a synthetic data"},
    ],
    response_model=UserProfile,
)

user

yields

UserProfile(name='Alice Wonderland', dob=datetime.date(1990, 3, 15), bio='A curious individual who loves to explore and discover new things.')

@sydney-runkle sydney-runkle added enhancement New feature or request and removed enhancement New feature or request labels Jan 2, 2025
@sydney-runkle
Copy link
Member

@samuelcolvin, could you please take a look at this? My understanding is that we're already using json schemas of models to guide coercing outputs to certain types...

@lazyhope
Copy link
Author

lazyhope commented Jan 2, 2025

@samuelcolvin, could you please take a look at this? My understanding is that we're already using json schemas of models to guide coercing outputs to certain types...

Yes, but my proposal is actually to have a mode that, instead of using model providers' tool calling api, parse the raw text response representing a json for a given result_type. This may involve some additional prompting for the model to only output json in its response.

Here is the current implementation of OpenAI models, which parses model's raw response and tool calls separately:

if choice.message.content is not None:
items.append(TextPart(choice.message.content))
if choice.message.tool_calls is not None:
for c in choice.message.tool_calls:
items.append(ToolCallPart.from_raw_args(c.function.name, c.function.arguments, c.id))

Under the proposed json mode, the code may look something like:

if choice.message.content is not None: 
     items.append(result_type.model_validate_json(choice.message.content))

and if the model failed to output a json text or it does not pass validation, retry.

@samuelcolvin
Copy link
Member

See #514 which is related. You could implement this now in a custom model, I think that's how MistralModel works.

I don't think there's any reason to move or copy that logic into Agent.

@dmontagu
Copy link
Contributor

dmontagu commented Jan 3, 2025

I'd be open to proposals/PRs with tweaks to the current model implementation that would make it easier to subclass/override and add functionality like this.

However, I will note that we can probably improve the handling of schemas with format in their fields independently, I'll open a PR to do that shortly.

@lazyhope
Copy link
Author

lazyhope commented Jan 4, 2025

Thanks. I'll explore what can be done, as I still believe this is a crucial feature missing from many frameworks.

Its implementation should not introduce significant complexity to the project, as it primarily involves prompting and validating string content using Pydantic models. Moreover, it's broadly applicable across all LLMs.

@aisensiy
Copy link

Currently, the open source model serving project vllm do not support tool_choice=required which will break the stuctured output.

Error Code: 400 - BadRequestError

Details:
OpenAIException - Error Code: 400
{
    "object": "error",
    "message": "[{
        'type': 'value_error',
        'loc': ('body',),
        'msg': 'Value error, `tool_choice` must either be a named tool, \"auto\", or \"none\".',
        'input': {
            'messages': [{'role': 'user', 'content': 'USA Capital'}],
            'model': 'qwen2.5-32b-awq',
            'n': 1,
            'parallel_tool_calls': True,
            'tool_choice': 'required',
            'tools': [{
                'type': 'function',
                'function': {
                    'name': 'final_result',
                    'description': 'The final response which ends this conversation',
                    'parameters': {
                        'properties': {
                            'city': {'title': 'City', 'type': 'string'},
                            'country': {'title': 'Country', 'type': 'string'},
                            'reason': {'title': 'Reason', 'type': 'string'}
                        },
                        'required': ['city', 'country', 'reason'],
                        'title': 'MyModel',
                        'type': 'object'
                    }
                }
            }]
        },
        'ctx': {
            'error': "ValueError('`tool_choice` must either be a named tool, \"auto\", or \"none\".')"
        }
    }]",
    "type": "BadRequestError",
    "param": None,
    "code": 400
}
Received Model Group: qwen2.5-32b
Available Model Group Fallbacks: None

But structured output like openAI is supported:

Request:

{
    "model": "qwen2.5-32b",
    "temperature": 0.1,
    "messages": [
        {
            "role": "user",
            "content": "North city in the US"
        }
    ],
    "extra_body": {
        "guided_json": {
            "properties": {
                "city": {
                    "title": "City",
                    "type": "string"
                },
                "country": {
                    "title": "Country",
                    "type": "string"
                },
                "reason": {
                    "title": "Reason",
                    "type": "string"
                }
            },
            "required": [
                "city",
                "country",
                "reason"
            ],
            "title": "MyModel",
            "type": "object"
        }
    }
}

Output:

{
    "id": "chatcmpl-3d629978021b407d8163add87355a758",
    "created": 1736494263,
    "model": "qwen2.5-32b-awq",
    "object": "chat.completion",
    "system_fingerprint": null,
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "{\"city\": \"Seattle\", \"country\": \"US\", \"reason\": \"Seattle is often referred to as the 'Emerald City' and is located in the northern part of the United States.\"}",
                "role": "assistant",
                "tool_calls": null,
                "function_call": null
            }
        }
    ],
    "usage": {
        "completion_tokens": 42,
        "prompt_tokens": 192,
        "total_tokens": 234,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
    },
    "service_tier": null,
    "prompt_logprobs": null
}

@samuelcolvin samuelcolvin added the enhancement New feature or request label Jan 16, 2025
@samuelcolvin
Copy link
Member

We should support structured outputs as well as tool calls for the result_type where the model supports it.

@samuelcolvin samuelcolvin changed the title [Feature] Add Support for Prompt-Based JSON Parsing Mode as an Alternative to Tool Calling Structured outputs as an alternative to Tool Calling Jan 16, 2025
@Finndersen
Copy link

Looks like #242 is also related?

Seems like structured outputs is the way to go since many providers support it natively

@lazyhope
Copy link
Author

We should support structured outputs as well as tool calls for the result_type where the model supports it.

Please note that Structured Output API from these two providers both have limitations: they only support a subset of JSON schema. Attributes like additionalProperties won't work. An example Pydantic model that is not supported could look like this:

class User(BaseModel):
    details: dict[
        Annotated[str, Field(description="User name", min_length=1)],
        Annotated[int, Field(description="User ID", gt=3)],
    ] = Field(max_length=1)

Its corresponding json schema:

{
	"properties": {
		"details": {
			"additionalProperties": {
				"description": "User ID",
				"exclusiveMinimum": 3,
				"type": "integer"
			},
			"maxProperties": 1,
			"propertyNames": {
				"description": "User name",
				"minLength": 1
			},
			"title": "Details",
			"type": "object"
		}
	},
	"required": ["details"],
	"title": "User",
	"type": "object"
}

Some useful references:
https://platform.openai.com/docs/guides/structured-outputs/examples#supported-schemas
https://dylancastillo.co/posts/gemini-structured-outputs.html
https://arxiv.org/abs/2408.02442

@kerolos-sss
Copy link

kerolos-sss commented Jan 31, 2025

is there something like CodeAgent in smolagents

They are parsing a code snippet that acts as the tool call, this might be very easy to adopt, providing raw python documentation for a function or a model declaration would be enough

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants