Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core[major]: Upgrade langchain-core to pydantic 2 #25986

Merged
merged 14 commits into from
Sep 3, 2024
Merged

Conversation

eyurtsev
Copy link
Collaborator

@eyurtsev eyurtsev commented Sep 3, 2024

This PR upgrades core to pydantic 2.

It involves a combination of manual changes together with automated code mods using gritql.

Changes and known issues:

  1. Current models override repr to be consistent with pydantic 1 (this will be removed in a follow up PR)
    Related: https://github.com/langchain-ai/langchain/pull/25986/files#diff-e5bd296179b7a72fcd4ea5cfa28b145beaf787da057e6d122aa76ee0bb8132c9R74
  2. Issue with decorator for BaseChatModel (https://github.com/langchain-ai/langchain/pull/25986/files#diff-932bf3b314b268754ef640a5b8f52da96f9024fb81dd388dcd166b5713ecdf66R202) -- cc @baskaryan
  3. name attribute in Base Runnable does not have a default -- was raising a pydantic warning due to override. We need to see if there's a way to fix to avoid making a breaking change for folks with custom runnables. (https://github.com/langchain-ai/langchain/pull/25986/files#diff-836773d27f8565f4dd45e9d6cf828920f89991a880c098b7511e0d3bb78a8a0dR238)
  4. Likely can remove hard-coded RunnableBranch name (https://github.com/langchain-ai/langchain/pull/25986/files#diff-72894b94f70b1bfc908eb4d53f5ff90bb33bf8a4240a5e34cae48ddc62ac313aR147)
  5. model_* namespace is reserved in pydantic. We'll need to specify protected_namespaces
  6. create_model does not have a cached path yet
  7. get_input_schema() in many places has been updated to be explicit about whether parameters are required or optional
  8. injected tool args aren't picked up properly (losing type annotation)

For posterity the following gritql migrations were used:

engine marzano(0.1)
language python

or {
    `from $IMPORT import $...` where {
        $IMPORT <: contains `pydantic_v1`,
        $IMPORT => `pydantic`
    },
    `$X.update_forward_refs` => `$X.model_rebuild`,
  // This pattern still needs fixing as it fails (populate_by_name vs.
  // allow_populate_by_name)
  class_definition($name, $body) as $C where {
      $name <: `Config`,
      $body <: block($statements),
      $t = "",
      $statements <: some bubble($t) assignment(left=$x, right=$y) as $A where {    
        or {
            $x <: `allow_population_by_field_name` where {
                $t += `populate_by_name=$y,`
            },
            $t += `$x=$y,`
        }
      },
      $C => `model_config = ConfigDict($t)`,
      add_import(source="pydantic", name="ConfigDict")
  }
}

engine marzano(0.1)
language python

`@root_validator(pre=True)` as $decorator where {
    $decorator <: before function_definition($body, $return_type),
    $decorator => `@model_validator(mode="before")\n@classmethod`,
    add_import(source="pydantic", name="model_validator"),
    $return_type => `Any`
}
engine marzano(0.1)
language python

`@root_validator(pre=False, skip_on_failure=True)` as $decorator where {
    $decorator <: before function_definition($body, $parameters, $return_type) where {
        $body <: contains bubble or {
            `values["$Q"]` => `self.$Q`,
            `values.get("$Q")` => `(self.$Q or None)`,
            `values.get($Q, $...)` as $V where {
                $Q <: contains `"$QName"`,
                $V => `self.$QName`,
            },
            `return $Q` => `return self`
        }
    },
    $decorator => `@model_validator(mode="after")`,
    // Silly work around a bug in grit
    // Adding Self to pydantic and then will replace it with one from typing
    add_import(source="pydantic", name="model_validator"),
    $parameters => `self`,
    $return_type => `Self`
}

grit apply --language python '`Self` where { add_import(source="typing_extensions", name="Self")}'

Copy link

vercel bot commented Sep 3, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ❌ Failed (Inspect) Sep 3, 2024 8:25pm

@efriis efriis marked this pull request as ready for review September 3, 2024 19:13
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Sep 3, 2024
@efriis efriis marked this pull request as draft September 3, 2024 19:13
@dosubot dosubot bot added the Ɑ: core Related to langchain-core label Sep 3, 2024
verbose: bool = Field(default_factory=_get_verbosity)
# Repr = False is consistent with pydantic 1 if verbose = False
# We can relax this for pydantic 2?
# TODO(Team): decide what to do here.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update todo since we know we want to undo this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken the issue was that this ends up affecting caching behavior with verbose=False and verbose=None being cached differently -- i'll double check, but i updated a TODO(0.3) for us to resolve

@@ -126,6 +130,10 @@ class BaseLanguageModel(
)
"""Optional encoder to use for counting tokens."""

model_config = ConfigDict(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this new?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's required when using pydantic 2 since cache is an attribute on the chat model and the cache is not a base model

@@ -155,3 +165,6 @@ def from_function(
args_schema=args_schema,
**kwargs,
)


Tool.model_rebuild()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add comment about what this does

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add in all places with a code mod later on -- this is the equvalent of update_forward_refs

Comment on lines +74 to +76
additional_kwargs: dict = Field(default_factory=dict, repr=False)
"""Currently inherited from BaseMessage, but not used."""
response_metadata: dict = Field(default_factory=dict, repr=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think these could technically be used

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK -- we'll address in a follow up PR and update unit tests -- it affects a bunch of snapshots

@eyurtsev eyurtsev marked this pull request as ready for review September 3, 2024 19:57
if hasattr(self.pydantic_schema, "model_validate_json"):
pydantic_args = self.pydantic_schema.model_validate_json(_result)
else:
pydantic_args = self.pydantic_schema.parse_raw(_result) # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea is we keep supporting pydantic v1 args?

@@ -252,7 +254,7 @@ def parse_result(self, result: List[Generation], *, partial: bool = False) -> An
class PydanticToolsParser(JsonOutputToolsParser):
"""Parse tools from OpenAI response."""

tools: List[TypeBaseModel]
tools: Annotated[List[TypeBaseModel], SkipValidation()]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we getting rid of TypeBaseModel?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could try to support .v1 models for a bit of time to avoid more breaking changes for users, but can also drop now

@dosubot dosubot bot added the langchain Related to the langchain package label Sep 3, 2024
@@ -16,7 +18,9 @@ class LLMResult(BaseModel):
wants to return.
"""

generations: List[List[Generation]]
generations: List[
List[Union[Generation, ChatGeneration, GenerationChunk, ChatGenerationChunk]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooc why's Generation not good enough anymore, do things get coerced to Generation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pydantic 2 is stricter with respect to union discrimination

@eyurtsev eyurtsev enabled auto-merge (squash) September 3, 2024 20:17
@eyurtsev eyurtsev disabled auto-merge September 3, 2024 20:18
@eyurtsev eyurtsev enabled auto-merge (squash) September 3, 2024 20:18
@eyurtsev eyurtsev added the 0.3 prep Work done for 0.3 prep label Sep 3, 2024
@eyurtsev eyurtsev disabled auto-merge September 3, 2024 20:26
@eyurtsev eyurtsev enabled auto-merge (squash) September 3, 2024 20:29
@eyurtsev eyurtsev disabled auto-merge September 3, 2024 20:29
@eyurtsev eyurtsev changed the title core[major]: 0.3rc core[major]: Upgrade langchain-core to pydantic 2 Sep 3, 2024
@eyurtsev eyurtsev merged commit ae5a574 into v0.3rc Sep 3, 2024
14 of 15 checks passed
@eyurtsev eyurtsev deleted the eugene/0.3rc_core branch September 3, 2024 20:30
@@ -54,6 +54,7 @@
)
from langchain_core.utils.function_calling import convert_to_openai_function
from langchain_core.utils.pydantic import PYDANTIC_MAJOR_VERSION, _create_subset_model
from langchain_core.utils.pydantic import TypeBaseModel as TypeBaseModel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why needed

@@ -874,15 +875,13 @@ async def _arun(self) -> str:

def test_optional_subset_model_rewrite() -> None:
class MyModel(BaseModel):
a: Optional[str]
a: Optional[str] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooc why needed

},
{
"id": messages[4].tool_call_id,
"type": "function",
"function": {"name": "FakeCall", "arguments": '{"data": "ToolCall3"}'},
"function": {"name": "FakeCall", "arguments": '{"data":"ToolCall3"}'},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still valid json

@@ -92,12 +94,12 @@ def validator(cls, v: Dict[str, Any]) -> Dict[str, Any]:
def test_is_basemodel_subclass() -> None:
"""Test pydantic."""
if PYDANTIC_MAJOR_VERSION == 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do need need to test pydantic v1 anymore?

Comment on lines +165 to +170
# Pydantic generics change the class name. So we need to do the following
if (
"origin" in cls.__pydantic_generic_metadata__
and cls.__pydantic_generic_metadata__["origin"] is not None
):
original_name = cls.__pydantic_generic_metadata__["origin"].__name__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel like ive seen this logic in a few places, maybe worth factoring out?

@@ -282,7 +316,8 @@ def _is_field_useful(inst: Serializable, key: str, value: Any) -> bool:
except Exception as _:
value_neq_default = False

return field.required is True or value_is_truthy or value_neq_default
# If value is falsy and does not match the default
return value_is_truthy or value_neq_default
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit if value_is_truth we would've already returned

NO_DEFAULT = object()


def create_base_class(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit tests for this would be great

fields[arg] = (new_arg_type, Field(**field_kwargs))
model = create_model(typed_dict.__name__, **fields)
fields[arg] = (new_arg_type, Field_v1(**field_kwargs))
model = create_model_v1(typed_dict.__name__, **fields)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're still creating v1 models? is this to minimize stuff we need to test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.3 prep Work done for 0.3 prep Ɑ: core Related to langchain-core langchain Related to the langchain package size:L This PR changes 100-499 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants