Structured outputs as an alternative to Tool Calling #582

lazyhope · 2025-01-01T17:29:35Z

Issue Description:

Currently, pydantic-ai implements structured output solely using tool-calling APIs from model providers. While this works in most cases, certain schemas supported by pydantic exhibit inconsistencies between model providers.

For instance, the following schema from the documentation does not work with Gemini models:

class UserProfile(TypedDict, total=False):
    name: str
    dob: date
    bio: str

agent = Agent(
    'gemini-2.0-flash-exp',
    result_type=UserProfile,
)
agent.run_sync("Generate a synthetic data")

This results in the following error:

UnexpectedModelBehavior: Unexpected response from gemini 400, body:
{
  "error": {
    "code": 400,
    "message": "* GenerateContentRequest.tools[0].function_declarations[0].parameters.properties[dob].format: only 'enum' is supported for STRING type\n",
    "status": "INVALID_ARGUMENT"
  }
}

In this example, the inconsistency stems from the model provider's limitations. However, based on my observations working with tools like instructor, modern LLMs are increasingly proficient at adhering to JSON-format prompts in their text responses. In fact, they often perform better in terms of json content in standard completion modes than in tool-calling modes. The Berkeley Function-Calling Leaderboard may provide further evidence of this trend.

Feature Request

Would it be possible for pydantic-ai to implement an alternative mode akin to instructor's MD_JSON mode? This mode could use prompt engineering to guide the LLM’s output and parse the resulting JSON as raw text rather than relying on tool-calling APIs.

Such a feature would:

Allow broader compatibility with any model capable of following JSON schema prompts.
Address model-specific inconsistencies while leveraging pydantic's full schema flexibility.

Thank you for considering this suggestion!

The text was updated successfully, but these errors were encountered:

lazyhope · 2025-01-01T17:45:57Z

FYI, here’s an example where instructor’s Markdown JSON mode works seamlessly with the UserProfile schema:

import instructor
from datetime import date
from typing import TypedDict
from litellm import completion

class UserProfile(TypedDict, total=False):
    name: str
    dob: date
    bio: str

client = instructor.from_litellm(completion, mode=instructor.Mode.MD_JSON)  # Switching to `instructor.Mode.TOOLS` would result in the same error mentioned earlier
user = client.chat.completions.create(
    model="gemini/gemini-2.0-flash-exp",
    messages=[
        {"role": "user", "content": "Generate a synthetic data"},
    ],
    response_model=UserProfile,
)

user

yields

UserProfile(name='Alice Wonderland', dob=datetime.date(1990, 3, 15), bio='A curious individual who loves to explore and discover new things.')

sydney-runkle · 2025-01-02T14:07:22Z

@samuelcolvin, could you please take a look at this? My understanding is that we're already using json schemas of models to guide coercing outputs to certain types...

lazyhope · 2025-01-02T14:27:10Z

@samuelcolvin, could you please take a look at this? My understanding is that we're already using json schemas of models to guide coercing outputs to certain types...

Yes, but my proposal is actually to have a mode that, instead of using model providers' tool calling api, parse the raw text response representing a json for a given result_type. This may involve some additional prompting for the model to only output json in its response.

Here is the current implementation of OpenAI models, which parses model's raw response and tool calls separately:

pydantic-ai/pydantic_ai_slim/pydantic_ai/models/openai.py

Lines 209 to 213 in c53c4e1

    
           if choice.message.content is not None: 
        
               items.append(TextPart(choice.message.content)) 
        
           if choice.message.tool_calls is not None: 
        
               for c in choice.message.tool_calls: 
        
                   items.append(ToolCallPart.from_raw_args(c.function.name, c.function.arguments, c.id))

Under the proposed json mode, the code may look something like:

if choice.message.content is not None: 
     items.append(result_type.model_validate_json(choice.message.content))

and if the model failed to output a json text or it does not pass validation, retry.

samuelcolvin · 2025-01-03T12:19:16Z

See #514 which is related. You could implement this now in a custom model, I think that's how MistralModel works.

I don't think there's any reason to move or copy that logic into Agent.

dmontagu · 2025-01-03T16:51:01Z

I'd be open to proposals/PRs with tweaks to the current model implementation that would make it easier to subclass/override and add functionality like this.

However, I will note that we can probably improve the handling of schemas with format in their fields independently, I'll open a PR to do that shortly.

lazyhope · 2025-01-04T16:21:50Z

Thanks. I'll explore what can be done, as I still believe this is a crucial feature missing from many frameworks.

Its implementation should not introduce significant complexity to the project, as it primarily involves prompting and validating string content using Pydantic models. Moreover, it's broadly applicable across all LLMs.

aisensiy · 2025-01-10T07:34:41Z

Currently, the open source model serving project vllm do not support tool_choice=required which will break the stuctured output.

Error Code: 400 - BadRequestError

Details:
OpenAIException - Error Code: 400
{
    "object": "error",
    "message": "[{
        'type': 'value_error',
        'loc': ('body',),
        'msg': 'Value error, `tool_choice` must either be a named tool, \"auto\", or \"none\".',
        'input': {
            'messages': [{'role': 'user', 'content': 'USA Capital'}],
            'model': 'qwen2.5-32b-awq',
            'n': 1,
            'parallel_tool_calls': True,
            'tool_choice': 'required',
            'tools': [{
                'type': 'function',
                'function': {
                    'name': 'final_result',
                    'description': 'The final response which ends this conversation',
                    'parameters': {
                        'properties': {
                            'city': {'title': 'City', 'type': 'string'},
                            'country': {'title': 'Country', 'type': 'string'},
                            'reason': {'title': 'Reason', 'type': 'string'}
                        },
                        'required': ['city', 'country', 'reason'],
                        'title': 'MyModel',
                        'type': 'object'
                    }
                }
            }]
        },
        'ctx': {
            'error': "ValueError('`tool_choice` must either be a named tool, \"auto\", or \"none\".')"
        }
    }]",
    "type": "BadRequestError",
    "param": None,
    "code": 400
}
Received Model Group: qwen2.5-32b
Available Model Group Fallbacks: None

But structured output like openAI is supported:

Request:

{
    "model": "qwen2.5-32b",
    "temperature": 0.1,
    "messages": [
        {
            "role": "user",
            "content": "North city in the US"
        }
    ],
    "extra_body": {
        "guided_json": {
            "properties": {
                "city": {
                    "title": "City",
                    "type": "string"
                },
                "country": {
                    "title": "Country",
                    "type": "string"
                },
                "reason": {
                    "title": "Reason",
                    "type": "string"
                }
            },
            "required": [
                "city",
                "country",
                "reason"
            ],
            "title": "MyModel",
            "type": "object"
        }
    }
}

Output:

{
    "id": "chatcmpl-3d629978021b407d8163add87355a758",
    "created": 1736494263,
    "model": "qwen2.5-32b-awq",
    "object": "chat.completion",
    "system_fingerprint": null,
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "{\"city\": \"Seattle\", \"country\": \"US\", \"reason\": \"Seattle is often referred to as the 'Emerald City' and is located in the northern part of the United States.\"}",
                "role": "assistant",
                "tool_calls": null,
                "function_call": null
            }
        }
    ],
    "usage": {
        "completion_tokens": 42,
        "prompt_tokens": 192,
        "total_tokens": 234,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
    },
    "service_tier": null,
    "prompt_logprobs": null
}

samuelcolvin · 2025-01-16T10:44:03Z

We should support structured outputs as well as tool calls for the result_type where the model supports it.

Finndersen · 2025-01-16T11:42:05Z

Looks like #242 is also related?

Seems like structured outputs is the way to go since many providers support it natively

lazyhope · 2025-01-16T23:33:12Z

We should support structured outputs as well as tool calls for the result_type where the model supports it.

OpenAI structured outputs

Gemini structured outputs

Please note that Structured Output API from these two providers both have limitations: they only support a subset of JSON schema. Attributes like additionalProperties won't work. An example Pydantic model that is not supported could look like this:

class User(BaseModel):
    details: dict[
        Annotated[str, Field(description="User name", min_length=1)],
        Annotated[int, Field(description="User ID", gt=3)],
    ] = Field(max_length=1)

Its corresponding json schema:

{
	"properties": {
		"details": {
			"additionalProperties": {
				"description": "User ID",
				"exclusiveMinimum": 3,
				"type": "integer"
			},
			"maxProperties": 1,
			"propertyNames": {
				"description": "User name",
				"minLength": 1
			},
			"title": "Details",
			"type": "object"
		}
	},
	"required": ["details"],
	"title": "User",
	"type": "object"
}

Some useful references:
https://platform.openai.com/docs/guides/structured-outputs/examples#supported-schemas
https://dylancastillo.co/posts/gemini-structured-outputs.html
https://arxiv.org/abs/2408.02442

kerolos-sss · 2025-01-31T13:35:13Z

is there something like CodeAgent in smolagents

They are parsing a code snippet that acts as the tool call, this might be very easy to adopt, providing raw python documentation for a function or a model declaration would be enough

lazyhope changed the title ~~Add Support for Prompt-Based JSON Parsing Mode as an Alternative to Tool Calling~~ [Feature] Add Support for Prompt-Based JSON Parsing Mode as an Alternative to Tool Calling Jan 1, 2025

sydney-runkle added enhancement New feature or request and removed enhancement New feature or request labels Jan 2, 2025

dmontagu mentioned this issue Jan 3, 2025

Improve string format handling for gemini #609

Merged

samuelcolvin added the enhancement New feature or request label Jan 16, 2025

samuelcolvin mentioned this issue Jan 16, 2025

Use OpenAI's Structured Outputs feature to prevent validation errors #514

Closed

samuelcolvin changed the title ~~[Feature] Add Support for Prompt-Based JSON Parsing Mode as an Alternative to Tool Calling~~ Structured outputs as an alternative to Tool Calling Jan 16, 2025

This was referenced Jan 16, 2025

Multiple tool calls not supported in streamed response #678

Closed

Question: Regarding Structured Output Strategy - How does it compare to other libraries? #660

Closed

sydney-runkle marked this as a duplicate of #242 Jan 24, 2025

sydney-runkle mentioned this issue Jan 24, 2025

Use ollama's structured outputs feature #242

Closed

samuelcolvin mentioned this issue Jan 24, 2025

Wasted Good models #758

Closed

samuelcolvin mentioned this issue Jan 31, 2025

Anthropic streaming returns in single chunk when agent result_type is specified #823

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structured outputs as an alternative to Tool Calling #582

Structured outputs as an alternative to Tool Calling #582

lazyhope commented Jan 1, 2025 •

edited

Loading

lazyhope commented Jan 1, 2025

sydney-runkle commented Jan 2, 2025

lazyhope commented Jan 2, 2025 •

edited

Loading

samuelcolvin commented Jan 3, 2025

dmontagu commented Jan 3, 2025

lazyhope commented Jan 4, 2025

aisensiy commented Jan 10, 2025

samuelcolvin commented Jan 16, 2025

Finndersen commented Jan 16, 2025

lazyhope commented Jan 16, 2025

kerolos-sss commented Jan 31, 2025 •

edited

Loading

Structured outputs as an alternative to Tool Calling #582

Structured outputs as an alternative to Tool Calling #582

Comments

lazyhope commented Jan 1, 2025 • edited Loading

Feature Request

lazyhope commented Jan 1, 2025

sydney-runkle commented Jan 2, 2025

lazyhope commented Jan 2, 2025 • edited Loading

samuelcolvin commented Jan 3, 2025

dmontagu commented Jan 3, 2025

lazyhope commented Jan 4, 2025

aisensiy commented Jan 10, 2025

samuelcolvin commented Jan 16, 2025

Finndersen commented Jan 16, 2025

lazyhope commented Jan 16, 2025

kerolos-sss commented Jan 31, 2025 • edited Loading

lazyhope commented Jan 1, 2025 •

edited

Loading

lazyhope commented Jan 2, 2025 •

edited

Loading

kerolos-sss commented Jan 31, 2025 •

edited

Loading