-
Notifications
You must be signed in to change notification settings - Fork 16.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core[major]: Upgrade langchain-core to pydantic 2 #25986
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
verbose: bool = Field(default_factory=_get_verbosity) | ||
# Repr = False is consistent with pydantic 1 if verbose = False | ||
# We can relax this for pydantic 2? | ||
# TODO(Team): decide what to do here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update todo since we know we want to undo this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken the issue was that this ends up affecting caching behavior with verbose=False and verbose=None being cached differently -- i'll double check, but i updated a TODO(0.3) for us to resolve
… into eugene/0.3rc_core
@@ -126,6 +130,10 @@ class BaseLanguageModel( | |||
) | |||
"""Optional encoder to use for counting tokens.""" | |||
|
|||
model_config = ConfigDict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this new?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's required when using pydantic 2 since cache is an attribute on the chat model and the cache is not a base model
@@ -155,3 +165,6 @@ def from_function( | |||
args_schema=args_schema, | |||
**kwargs, | |||
) | |||
|
|||
|
|||
Tool.model_rebuild() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add comment about what this does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add in all places with a code mod later on -- this is the equvalent of update_forward_refs
additional_kwargs: dict = Field(default_factory=dict, repr=False) | ||
"""Currently inherited from BaseMessage, but not used.""" | ||
response_metadata: dict = Field(default_factory=dict, repr=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think these could technically be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK -- we'll address in a follow up PR and update unit tests -- it affects a bunch of snapshots
if hasattr(self.pydantic_schema, "model_validate_json"): | ||
pydantic_args = self.pydantic_schema.model_validate_json(_result) | ||
else: | ||
pydantic_args = self.pydantic_schema.parse_raw(_result) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idea is we keep supporting pydantic v1 args?
@@ -252,7 +254,7 @@ def parse_result(self, result: List[Generation], *, partial: bool = False) -> An | |||
class PydanticToolsParser(JsonOutputToolsParser): | |||
"""Parse tools from OpenAI response.""" | |||
|
|||
tools: List[TypeBaseModel] | |||
tools: Annotated[List[TypeBaseModel], SkipValidation()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we getting rid of TypeBaseModel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could try to support .v1 models for a bit of time to avoid more breaking changes for users, but can also drop now
@@ -16,7 +18,9 @@ class LLMResult(BaseModel): | |||
wants to return. | |||
""" | |||
|
|||
generations: List[List[Generation]] | |||
generations: List[ | |||
List[Union[Generation, ChatGeneration, GenerationChunk, ChatGenerationChunk]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooc why's Generation not good enough anymore, do things get coerced to Generation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pydantic 2 is stricter with respect to union discrimination
@@ -54,6 +54,7 @@ | |||
) | |||
from langchain_core.utils.function_calling import convert_to_openai_function | |||
from langchain_core.utils.pydantic import PYDANTIC_MAJOR_VERSION, _create_subset_model | |||
from langchain_core.utils.pydantic import TypeBaseModel as TypeBaseModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why needed
@@ -874,15 +875,13 @@ async def _arun(self) -> str: | |||
|
|||
def test_optional_subset_model_rewrite() -> None: | |||
class MyModel(BaseModel): | |||
a: Optional[str] | |||
a: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooc why needed
}, | ||
{ | ||
"id": messages[4].tool_call_id, | ||
"type": "function", | ||
"function": {"name": "FakeCall", "arguments": '{"data": "ToolCall3"}'}, | ||
"function": {"name": "FakeCall", "arguments": '{"data":"ToolCall3"}'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still valid json
@@ -92,12 +94,12 @@ def validator(cls, v: Dict[str, Any]) -> Dict[str, Any]: | |||
def test_is_basemodel_subclass() -> None: | |||
"""Test pydantic.""" | |||
if PYDANTIC_MAJOR_VERSION == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do need need to test pydantic v1 anymore?
# Pydantic generics change the class name. So we need to do the following | ||
if ( | ||
"origin" in cls.__pydantic_generic_metadata__ | ||
and cls.__pydantic_generic_metadata__["origin"] is not None | ||
): | ||
original_name = cls.__pydantic_generic_metadata__["origin"].__name__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel like ive seen this logic in a few places, maybe worth factoring out?
@@ -282,7 +316,8 @@ def _is_field_useful(inst: Serializable, key: str, value: Any) -> bool: | |||
except Exception as _: | |||
value_neq_default = False | |||
|
|||
return field.required is True or value_is_truthy or value_neq_default | |||
# If value is falsy and does not match the default | |||
return value_is_truthy or value_neq_default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit if value_is_truth we would've already returned
NO_DEFAULT = object() | ||
|
||
|
||
def create_base_class( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit tests for this would be great
fields[arg] = (new_arg_type, Field(**field_kwargs)) | ||
model = create_model(typed_dict.__name__, **fields) | ||
fields[arg] = (new_arg_type, Field_v1(**field_kwargs)) | ||
model = create_model_v1(typed_dict.__name__, **fields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're still creating v1 models? is this to minimize stuff we need to test?
This PR upgrades core to pydantic 2.
It involves a combination of manual changes together with automated code mods using gritql.
Changes and known issues:
Related: https://github.com/langchain-ai/langchain/pull/25986/files#diff-e5bd296179b7a72fcd4ea5cfa28b145beaf787da057e6d122aa76ee0bb8132c9R74
name
attribute in Base Runnable does not have a default -- was raising a pydantic warning due to override. We need to see if there's a way to fix to avoid making a breaking change for folks with custom runnables. (https://github.com/langchain-ai/langchain/pull/25986/files#diff-836773d27f8565f4dd45e9d6cf828920f89991a880c098b7511e0d3bb78a8a0dR238)model_*
namespace is reserved in pydantic. We'll need to specifyprotected_namespaces
For posterity the following gritql migrations were used: