core[major]: Upgrade langchain-core to pydantic 2 #25986

eyurtsev · 2024-09-03T19:03:49Z

This PR upgrades core to pydantic 2.

It involves a combination of manual changes together with automated code mods using gritql.

Changes and known issues:

Current models override repr to be consistent with pydantic 1 (this will be removed in a follow up PR)
Related: https://github.com/langchain-ai/langchain/pull/25986/files#diff-e5bd296179b7a72fcd4ea5cfa28b145beaf787da057e6d122aa76ee0bb8132c9R74
Issue with decorator for BaseChatModel (https://github.com/langchain-ai/langchain/pull/25986/files#diff-932bf3b314b268754ef640a5b8f52da96f9024fb81dd388dcd166b5713ecdf66R202) -- cc @baskaryan
name attribute in Base Runnable does not have a default -- was raising a pydantic warning due to override. We need to see if there's a way to fix to avoid making a breaking change for folks with custom runnables. (https://github.com/langchain-ai/langchain/pull/25986/files#diff-836773d27f8565f4dd45e9d6cf828920f89991a880c098b7511e0d3bb78a8a0dR238)
Likely can remove hard-coded RunnableBranch name (https://github.com/langchain-ai/langchain/pull/25986/files#diff-72894b94f70b1bfc908eb4d53f5ff90bb33bf8a4240a5e34cae48ddc62ac313aR147)
model_* namespace is reserved in pydantic. We'll need to specify protected_namespaces
create_model does not have a cached path yet
get_input_schema() in many places has been updated to be explicit about whether parameters are required or optional
injected tool args aren't picked up properly (losing type annotation)

For posterity the following gritql migrations were used:

engine marzano(0.1)
language python

or {
    `from $IMPORT import $...` where {
        $IMPORT <: contains `pydantic_v1`,
        $IMPORT => `pydantic`
    },
    `$X.update_forward_refs` => `$X.model_rebuild`,
  // This pattern still needs fixing as it fails (populate_by_name vs.
  // allow_populate_by_name)
  class_definition($name, $body) as $C where {
      $name <: `Config`,
      $body <: block($statements),
      $t = "",
      $statements <: some bubble($t) assignment(left=$x, right=$y) as $A where {    
        or {
            $x <: `allow_population_by_field_name` where {
                $t += `populate_by_name=$y,`
            },
            $t += `$x=$y,`
        }
      },
      $C => `model_config = ConfigDict($t)`,
      add_import(source="pydantic", name="ConfigDict")
  }
}

engine marzano(0.1)
language python

`@root_validator(pre=True)` as $decorator where {
    $decorator <: before function_definition($body, $return_type),
    $decorator => `@model_validator(mode="before")\n@classmethod`,
    add_import(source="pydantic", name="model_validator"),
    $return_type => `Any`
}

engine marzano(0.1)
language python

`@root_validator(pre=False, skip_on_failure=True)` as $decorator where {
    $decorator <: before function_definition($body, $parameters, $return_type) where {
        $body <: contains bubble or {
            `values["$Q"]` => `self.$Q`,
            `values.get("$Q")` => `(self.$Q or None)`,
            `values.get($Q, $...)` as $V where {
                $Q <: contains `"$QName"`,
                $V => `self.$QName`,
            },
            `return $Q` => `return self`
        }
    },
    $decorator => `@model_validator(mode="after")`,
    // Silly work around a bug in grit
    // Adding Self to pydantic and then will replace it with one from typing
    add_import(source="pydantic", name="model_validator"),
    $parameters => `self`,
    $return_type => `Self`
}

grit apply --language python '`Self` where { add_import(source="typing_extensions", name="Self")}'

vercel · 2024-09-03T19:03:51Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
langchain	❌ Failed (Inspect)			Sep 3, 2024 8:25pm

baskaryan · 2024-09-03T19:32:26Z

libs/core/langchain_core/language_models/base.py

-    verbose: bool = Field(default_factory=_get_verbosity)
+    # Repr = False is consistent with pydantic 1 if verbose = False
+    # We can relax this for pydantic 2?
+    # TODO(Team): decide what to do here.


update todo since we know we want to undo this

If I'm not mistaken the issue was that this ends up affecting caching behavior with verbose=False and verbose=None being cached differently -- i'll double check, but i updated a TODO(0.3) for us to resolve

… into eugene/0.3rc_core

baskaryan · 2024-09-03T19:32:55Z

libs/core/langchain_core/language_models/base.py

@@ -126,6 +130,10 @@ class BaseLanguageModel(
    )
    """Optional encoder to use for counting tokens."""

+    model_config = ConfigDict(


is this new?

Yes, it's required when using pydantic 2 since cache is an attribute on the chat model and the cache is not a base model

libs/core/langchain_core/language_models/base.py

… into eugene/0.3rc_core

baskaryan · 2024-09-03T19:51:21Z

libs/core/langchain_core/tools/simple.py

@@ -155,3 +165,6 @@ def from_function(
            args_schema=args_schema,
            **kwargs,
        )
+
+
+Tool.model_rebuild()


could we add comment about what this does

I'll add in all places with a code mod later on -- this is the equvalent of update_forward_refs

baskaryan · 2024-09-03T19:56:25Z

libs/core/langchain_core/messages/tool.py

+    additional_kwargs: dict = Field(default_factory=dict, repr=False)
+    """Currently inherited from BaseMessage, but not used."""
+    response_metadata: dict = Field(default_factory=dict, repr=False)


i think these could technically be used

OK -- we'll address in a follow up PR and update unit tests -- it affects a bunch of snapshots

libs/core/langchain_core/output_parsers/json.py

baskaryan · 2024-09-03T19:57:47Z

libs/core/langchain_core/output_parsers/openai_functions.py

+            if hasattr(self.pydantic_schema, "model_validate_json"):
+                pydantic_args = self.pydantic_schema.model_validate_json(_result)
+            else:
+                pydantic_args = self.pydantic_schema.parse_raw(_result)  # type: ignore


idea is we keep supporting pydantic v1 args?

baskaryan · 2024-09-03T19:58:10Z

libs/core/langchain_core/output_parsers/openai_tools.py

@@ -252,7 +254,7 @@ def parse_result(self, result: List[Generation], *, partial: bool = False) -> An
 class PydanticToolsParser(JsonOutputToolsParser):
    """Parse tools from OpenAI response."""

-    tools: List[TypeBaseModel]
+    tools: Annotated[List[TypeBaseModel], SkipValidation()]


are we getting rid of TypeBaseModel?

We could try to support .v1 models for a bit of time to avoid more breaking changes for users, but can also drop now

baskaryan · 2024-09-03T20:00:13Z

libs/core/langchain_core/outputs/llm_result.py

@@ -16,7 +18,9 @@ class LLMResult(BaseModel):
    wants to return.
    """

-    generations: List[List[Generation]]
+    generations: List[
+        List[Union[Generation, ChatGeneration, GenerationChunk, ChatGenerationChunk]]


ooc why's Generation not good enough anymore, do things get coerced to Generation?

pydantic 2 is stricter with respect to union discrimination

libs/core/langchain_core/prompts/chat.py

libs/core/langchain_core/runnables/base.py

libs/core/langchain_core/prompts/chat.py

… into eugene/0.3rc_core

baskaryan · 2024-09-03T20:39:55Z

libs/core/tests/unit_tests/test_tools.py

@@ -54,6 +54,7 @@
 )
 from langchain_core.utils.function_calling import convert_to_openai_function
 from langchain_core.utils.pydantic import PYDANTIC_MAJOR_VERSION, _create_subset_model
+from langchain_core.utils.pydantic import TypeBaseModel as TypeBaseModel


baskaryan · 2024-09-03T20:40:02Z

libs/core/tests/unit_tests/test_tools.py

@@ -874,15 +875,13 @@ async def _arun(self) -> str:

 def test_optional_subset_model_rewrite() -> None:
    class MyModel(BaseModel):
-        a: Optional[str]
+        a: Optional[str] = None


ooc why needed

baskaryan · 2024-09-03T20:41:19Z

libs/core/tests/unit_tests/utils/test_function_calling.py

        },
        {
            "id": messages[4].tool_call_id,
            "type": "function",
-            "function": {"name": "FakeCall", "arguments": '{"data": "ToolCall3"}'},
+            "function": {"name": "FakeCall", "arguments": '{"data":"ToolCall3"}'},


is this still valid json

baskaryan · 2024-09-03T20:41:55Z

libs/core/tests/unit_tests/utils/test_pydantic.py

@@ -92,12 +94,12 @@ def validator(cls, v: Dict[str, Any]) -> Dict[str, Any]:
 def test_is_basemodel_subclass() -> None:
    """Test pydantic."""
    if PYDANTIC_MAJOR_VERSION == 1:


do need need to test pydantic v1 anymore?

baskaryan · 2024-09-03T20:45:20Z

libs/core/langchain_core/load/serializable.py

+        # Pydantic generics change the class name. So we need to do the following
+        if (
+            "origin" in cls.__pydantic_generic_metadata__
+            and cls.__pydantic_generic_metadata__["origin"] is not None
+        ):
+            original_name = cls.__pydantic_generic_metadata__["origin"].__name__


feel like ive seen this logic in a few places, maybe worth factoring out?

baskaryan · 2024-09-03T20:50:08Z

libs/core/langchain_core/load/serializable.py

@@ -282,7 +316,8 @@ def _is_field_useful(inst: Serializable, key: str, value: Any) -> bool:
            except Exception as _:
                value_neq_default = False

-    return field.required is True or value_is_truthy or value_neq_default
+    # If value is falsy and does not match the default
+    return value_is_truthy or value_neq_default


nit if value_is_truth we would've already returned

baskaryan · 2024-09-03T20:53:53Z

libs/core/langchain_core/runnables/utils.py

+NO_DEFAULT = object()
+
+
+def create_base_class(


unit tests for this would be great

baskaryan · 2024-09-03T20:56:33Z

libs/core/langchain_core/utils/function_calling.py

-                fields[arg] = (new_arg_type, Field(**field_kwargs))
-        model = create_model(typed_dict.__name__, **fields)
+                fields[arg] = (new_arg_type, Field_v1(**field_kwargs))
+        model = create_model_v1(typed_dict.__name__, **fields)


we're still creating v1 models? is this to minimize stuff we need to test?

upgrade

a4c3c26

lint pydantic extra

b69ad5d

efriis marked this pull request as ready for review September 3, 2024 19:13

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Sep 3, 2024

efriis marked this pull request as draft September 3, 2024 19:13

dosubot bot added the Ɑ: core Related to langchain-core label Sep 3, 2024

Merge branch 'v0.3rc' into eugene/0.3rc_core

d52a962

baskaryan reviewed Sep 3, 2024

View reviewed changes

eyurtsev added 2 commits September 3, 2024 15:32

update

988c912

Merge branch 'eugene/0.3rc_core' of github.com:langchain-ai/langchain…

d659b07

… into eugene/0.3rc_core

baskaryan reviewed Sep 3, 2024

View reviewed changes

eyurtsev commented Sep 3, 2024

View reviewed changes

libs/core/langchain_core/language_models/base.py Outdated Show resolved Hide resolved

eyurtsev added 4 commits September 3, 2024 15:38

Update libs/core/langchain_core/language_models/base.py

b7bca34

x

6f7c065

Merge branch 'eugene/0.3rc_core' of github.com:langchain-ai/langchain…

7ec7822

… into eugene/0.3rc_core

ugh one more import

c890fcd

baskaryan reviewed Sep 3, 2024

View reviewed changes

x

f39f196

baskaryan reviewed Sep 3, 2024

View reviewed changes

libs/core/langchain_core/output_parsers/json.py Outdated Show resolved Hide resolved

eyurtsev marked this pull request as ready for review September 3, 2024 19:57

baskaryan reviewed Sep 3, 2024

View reviewed changes

dosubot bot added the langchain Related to the langchain package label Sep 3, 2024

baskaryan reviewed Sep 3, 2024

View reviewed changes

libs/core/langchain_core/prompts/chat.py Outdated Show resolved Hide resolved

baskaryan reviewed Sep 3, 2024

View reviewed changes

libs/core/langchain_core/runnables/base.py Outdated Show resolved Hide resolved

baskaryan reviewed Sep 3, 2024

View reviewed changes

libs/core/langchain_core/runnables/base.py Outdated Show resolved Hide resolved

eyurtsev commented Sep 3, 2024

View reviewed changes

libs/core/langchain_core/prompts/chat.py Outdated Show resolved Hide resolved

eyurtsev added 4 commits September 3, 2024 16:06

Update libs/core/langchain_core/prompts/chat.py

3ba41c2

update comment

ef50011

Merge branch 'eugene/0.3rc_core' of github.com:langchain-ai/langchain…

f6eb475

… into eugene/0.3rc_core

remove pydantic: ignore

6a20727

eyurtsev enabled auto-merge (squash) September 3, 2024 20:17

eyurtsev disabled auto-merge September 3, 2024 20:18

eyurtsev enabled auto-merge (squash) September 3, 2024 20:18

eyurtsev added the 0.3 prep Work done for 0.3 prep label Sep 3, 2024

vercel bot had a problem deploying to Preview September 3, 2024 20:25 Failure

eyurtsev disabled auto-merge September 3, 2024 20:26

eyurtsev enabled auto-merge (squash) September 3, 2024 20:29

eyurtsev disabled auto-merge September 3, 2024 20:29

eyurtsev changed the title ~~core[major]: 0.3rc~~ core[major]: Upgrade langchain-core to pydantic 2 Sep 3, 2024

eyurtsev merged commit ae5a574 into v0.3rc Sep 3, 2024
14 of 15 checks passed

eyurtsev deleted the eugene/0.3rc_core branch September 3, 2024 20:30

baskaryan reviewed Sep 3, 2024

View reviewed changes

libs/core/langchain_core/runnables/utils.py

NO_DEFAULT = object()

def create_base_class(

Copy link

Collaborator

baskaryan Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit tests for this would be great

baskaryan reviewed Sep 3, 2024

View reviewed changes

efriis mentioned this pull request Sep 13, 2024

erick/v03 merge master again #26444

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core[major]: Upgrade langchain-core to pydantic 2 #25986

core[major]: Upgrade langchain-core to pydantic 2 #25986

eyurtsev commented Sep 3, 2024 •

edited

Loading

vercel bot commented Sep 3, 2024 •

edited

Loading

baskaryan Sep 3, 2024

eyurtsev Sep 3, 2024

baskaryan Sep 3, 2024

eyurtsev Sep 3, 2024

baskaryan Sep 3, 2024

eyurtsev Sep 3, 2024

baskaryan Sep 3, 2024

eyurtsev Sep 3, 2024

baskaryan Sep 3, 2024

baskaryan Sep 3, 2024

eyurtsev Sep 3, 2024

baskaryan Sep 3, 2024

eyurtsev Sep 3, 2024

baskaryan Sep 3, 2024

baskaryan Sep 3, 2024

baskaryan Sep 3, 2024

baskaryan Sep 3, 2024

baskaryan Sep 3, 2024

baskaryan Sep 3, 2024

baskaryan Sep 3, 2024

baskaryan Sep 3, 2024

core[major]: Upgrade langchain-core to pydantic 2 #25986

core[major]: Upgrade langchain-core to pydantic 2 #25986

Conversation

eyurtsev commented Sep 3, 2024 • edited Loading

vercel bot commented Sep 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eyurtsev commented Sep 3, 2024 •

edited

Loading

vercel bot commented Sep 3, 2024 •

edited

Loading