Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasted Good models #758

Closed
kerolos-gadalla opened this issue Jan 24, 2025 · 7 comments
Closed

Wasted Good models #758

kerolos-gadalla opened this issue Jan 24, 2025 · 7 comments
Labels
question Further information is requested

Comments

@kerolos-gadalla
Copy link

Can we find a way to adapt non tool use models into the pydantac AI tool use, It should be pretty much just prompting and parsing,

@samuelcolvin
Copy link
Member

I don't understand what you're asking, please can you provide more details.

Also please no all caps.

@samuelcolvin samuelcolvin added the question Further information is requested label Jan 24, 2025
@kerolos-gadalla kerolos-gadalla changed the title WASTERD GOOD MODELS Wasted Good models Jan 24, 2025
@kerolos-gadalla
Copy link
Author

We have several promising models, such as Phi and DeepSeek R1, that excel in various aspects. However, these models do not natively support tool use, as they lack the necessary token structures for delimiting tool use queries and responses.

That said, with a well-designed prompt, these models can be adapted to produce structured outputs suitable for tool use. Given that Pydantic AI is heavily optimized for tool use and works best with structured outputs, we are currently underutilizing these high-quality models by not integrating them effectively.

The key challenge is designing the right prompting and parsing strategy to bridge this gap. With the right approach, we can make non-tool-use models work seamlessly within a tool-using framework like Pydantic AI, maximizing their potential.

@kerolos-gadalla
Copy link
Author

The adapter can be on the API level, creating a generous tool for everyone to map model answers to tool use and back, or it can go inside the flows of pydantic AI itself

@HamzaFarhan
Copy link

Would "well-designed prompt" be as reliable as result_type tho?

@izzyacademy
Copy link
Contributor

@kerolos-gadalla I think the main issue you are having is probably how to use the Agent class correctly. I don't believe these models (DeepSeek R1, Phi3, Phi4 etc) are wasted. I am able to use the Agent class with pretty much any model I choose to. You just have to be conscious of the capabilities and limitations of the LLM you are using.

  • You should not specify any result type in the Agent constructor if your LLM is not able to handle structured output
  • You should not associate your Agent object with tools, if your LLM is unable to handle it properly.

The smaller models have clear limitations but yet developers keep trying to make them sing in Soprano when they can't even get out of the Bass/Baritone range.

We are seeing a barrage of Github issues with structured output and tool calling with these small models and I think it is coming from a place of not understanding the model capabilities and how to use the LLMs with the Agent class.

I hope this explains it better.

@samuelcolvin
Copy link
Member

Duplicate of #582, we intend to support structured outputs as well as tool calls for structured results types.

@kerolos-sss
Copy link

is there something like CodeAgent in smolagents

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants