Wasted Good models #758

kerolos-gadalla · 2025-01-24T07:04:59Z

Can we find a way to adapt non tool use models into the pydantac AI tool use, It should be pretty much just prompting and parsing,

samuelcolvin · 2025-01-24T07:23:52Z

I don't understand what you're asking, please can you provide more details.

Also please no all caps.

kerolos-gadalla · 2025-01-24T08:05:12Z

We have several promising models, such as Phi and DeepSeek R1, that excel in various aspects. However, these models do not natively support tool use, as they lack the necessary token structures for delimiting tool use queries and responses.

That said, with a well-designed prompt, these models can be adapted to produce structured outputs suitable for tool use. Given that Pydantic AI is heavily optimized for tool use and works best with structured outputs, we are currently underutilizing these high-quality models by not integrating them effectively.

The key challenge is designing the right prompting and parsing strategy to bridge this gap. With the right approach, we can make non-tool-use models work seamlessly within a tool-using framework like Pydantic AI, maximizing their potential.

kerolos-gadalla · 2025-01-24T08:07:45Z

The adapter can be on the API level, creating a generous tool for everyone to map model answers to tool use and back, or it can go inside the flows of pydantic AI itself

HamzaFarhan · 2025-01-24T12:10:06Z

Would "well-designed prompt" be as reliable as result_type tho?

izzyacademy · 2025-01-24T15:59:18Z

@kerolos-gadalla I think the main issue you are having is probably how to use the Agent class correctly. I don't believe these models (DeepSeek R1, Phi3, Phi4 etc) are wasted. I am able to use the Agent class with pretty much any model I choose to. You just have to be conscious of the capabilities and limitations of the LLM you are using.

You should not specify any result type in the Agent constructor if your LLM is not able to handle structured output
You should not associate your Agent object with tools, if your LLM is unable to handle it properly.

The smaller models have clear limitations but yet developers keep trying to make them sing in Soprano when they can't even get out of the Bass/Baritone range.

We are seeing a barrage of Github issues with structured output and tool calling with these small models and I think it is coming from a place of not understanding the model capabilities and how to use the LLMs with the Agent class.

I hope this explains it better.

samuelcolvin · 2025-01-24T19:53:25Z

Duplicate of #582, we intend to support structured outputs as well as tool calls for structured results types.

kerolos-sss · 2025-01-31T13:33:15Z

is there something like CodeAgent in smolagents

samuelcolvin added the question Further information is requested label Jan 24, 2025

kerolos-gadalla changed the title ~~WASTERD GOOD MODELS~~ Wasted Good models Jan 24, 2025

samuelcolvin closed this as completed Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wasted Good models #758

Wasted Good models #758

kerolos-gadalla commented Jan 24, 2025

samuelcolvin commented Jan 24, 2025

kerolos-gadalla commented Jan 24, 2025

kerolos-gadalla commented Jan 24, 2025

HamzaFarhan commented Jan 24, 2025

izzyacademy commented Jan 24, 2025

samuelcolvin commented Jan 24, 2025

kerolos-sss commented Jan 31, 2025

Wasted Good models #758

Wasted Good models #758

Comments

kerolos-gadalla commented Jan 24, 2025

samuelcolvin commented Jan 24, 2025

kerolos-gadalla commented Jan 24, 2025

kerolos-gadalla commented Jan 24, 2025

HamzaFarhan commented Jan 24, 2025

izzyacademy commented Jan 24, 2025

samuelcolvin commented Jan 24, 2025

kerolos-sss commented Jan 31, 2025