-
-
Notifications
You must be signed in to change notification settings - Fork 730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tutorials #154
Add tutorials #154
Conversation
WalkthroughThe updates encompass the integration of Pydantic for structured data validation and schema definition, enhancements to the OpenAI SDK via the Changes
TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- tutorials/1.introduction.ipynb (1 hunks)
Files not reviewed due to errors (1)
- tutorials/1.introduction.ipynb (Error: diff too large)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files ignored due to filter (1)
- pyproject.toml
Files selected for processing (1)
- tutorials/2.applications-rag.ipynb (1 hunks)
Files not reviewed due to errors (1)
- tutorials/2.applications-rag.ipynb (Error: diff too large)
Co-authored-by: Jason Liu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files ignored due to filter (1)
- pyproject.toml
Files selected for processing (7)
- README.md (1 hunks)
- docs/index.md (8 hunks)
- instructor/init.py (2 hunks)
- instructor/patch.py (2 hunks)
- tests/test_patch.py (1 hunks)
- tutorials/1.introduction.ipynb (1 hunks)
- tutorials/2.applications-rag.ipynb (1 hunks)
Files not reviewed due to errors (1)
- tutorials/1.introduction.ipynb (Error: diff too large)
Files skipped from review due to trivial changes (2)
- docs/index.md
- tutorials/2.applications-rag.ipynb
Additional comments: 10
instructor/__init__.py (2)
1-4: The new import
apatch
from the.patch
module is added. Ensure that the function or variableapatch
is defined in the.patch
module.11-17: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [6-17]
The
apatch
is added to the__all__
list, which means it is now part of the public interface of theinstructor
module. Make sure that this is the intended behavior and that the usage ofapatch
is documented.README.md (3)
45-45: The assertion is correct and ensures that the age is correctly extracted.
48-62: The instructions for using
openai<1.0.0
are clear and provide the necessary steps for patching a global client.65-88: The instructions for using async clients are clear and provide the necessary steps for patching an async client. The example provided is also correct.
instructor/patch.py (1)
- 212-214: The
wrap_chatcompletion
function is called withis_async=True
only forclient.chat.completions.create
. Ensure that other async methods of the client are also wrapped withis_async=True
if needed.tests/test_patch.py (4)
1-5: Imports are correctly placed and organized according to PEP8 guidelines.
8-9: The
OpenAI
andAsyncOpenAI
clients are correctly patched with theinstructor
module.12-30: The
test_async_runmodel
function is correctly defined as an async function and marked withpytest.mark.asyncio
. TheUserExtract
model is defined correctly and used as theresponse_model
in theaclient.chat.completions.create
call. The assertions are correctly checking the instance type of the model, the value of thename
attribute, and the existence of the_raw_response
attribute.32-32: The
test_runmodel
function is not implemented. Ensure to add the implementation or remove the function if it's not needed.
instructor/patch.py
Outdated
@@ -134,9 +134,7 @@ def retry_sync( | |||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure that the exception e
is logged for debugging purposes before it's raised again.
- raise e
+ logging.exception(e)
+ raise e
Commitable suggestion
[!IMPORTANT]
Carefully review the code before committing. Make sure it correctly replaces the highlighted code, has no missing lines and indentaion issues.
raise e | |
logging.exception(e) | |
raise e |
instructor/patch.py
Outdated
def wrap_chatcompletion(func: Callable, is_async: bool = None) -> Callable: | ||
@wraps(func) | ||
async def new_chatcompletion_async( | ||
response_model=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_async
parameter is not documented in the function docstring. Add a description for is_async
in the function docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (3)
- tutorials/1.introduction.ipynb (1 hunks)
- tutorials/2.tips.ipynb (1 hunks)
- tutorials/3.applications-rag.ipynb (1 hunks)
Files not reviewed due to errors (3)
- tutorials/1.introduction.ipynb (Error: diff too large)
- tutorials/2.tips.ipynb (Error: diff too large)
- tutorials/3.applications-rag.ipynb (Error: diff too large)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- tutorials/4.knowledge-graphs.ipynb (1 hunks)
Files not summarized due to errors (1)
- tutorials/4.knowledge-graphs.ipynb: Error: Message exceeds token limit
Files not reviewed due to errors (1)
- tutorials/4.knowledge-graphs.ipynb (Error: diff too large)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- tutorials/5.validation.ipynb (1 hunks)
Files not reviewed due to errors (1)
- tutorials/5.validation.ipynb (Error: diff too large)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (1)
- tutorials/4.knowledge-graphs.ipynb (1 hunks)
Additional comments: 10
tutorials/4.knowledge-graphs.ipynb (10)
7-28: The introduction to knowledge graphs is clear and informative, providing a good foundation for the rest of the tutorial.
49-50: The explanation of the libraries used is concise and gives the reader a clear understanding of their purpose in the tutorial.
67-70: The import statements and the client initialization look correct, but it's important to ensure that the
instructor
library'spatch
function is compatible with theOpenAI
class from theopenai
library.97-109: The
Node
andEdge
classes are well-defined using Pydantic, which will help with data validation and error handling.139-152: The
KnowledgeGraph
class and itsvisualize_knowledge_graph
method are well implemented. The use of thegraphviz
library for visualization is appropriate.174-184: The
generate_graph
function is designed to interact with an AI model to generate a knowledge graph. Ensure that theclient.chat.completions.create
method exists and that theresponse_model
parameter correctly handles the conversion to aKnowledgeGraph
object.370-385: The addition of the
__hash__
method to theNode
andEdge
classes is a good practice for handling duplicates, especially when using these objects in sets.395-415: The
KnowledgeGraph
class has been updated to make thenodes
andedges
fields optional, which adds flexibility. Theupdate
method is a good addition for merging graphs. However, ensure that the deduplication logic in theupdate
method works as intended, as combining lists and converting to a set may not work if the objects are not hashable or if their equality is not defined correctly.514-521: The example use case for generating a knowledge graph from text chunks is a practical demonstration of the iterative graph generation process.
535-590: The conclusion provides a good summary of the tutorial's content and suggests further exercises for the reader, which is a great way to encourage practice and deeper understanding.
tutorials/4.knowledge-graphs.ipynb
Outdated
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!pip install instructor graphviz --quiet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installation command for the required libraries is correct, but it's always a good practice to specify the version of the libraries to ensure compatibility.
tutorials/4.knowledge-graphs.ipynb
Outdated
"def generate_graph(input: List[str]) -> KnowledgeGraph:\n", | ||
" # Initialize an empty KnowledgeGraph\n", | ||
" cur_state = KnowledgeGraph()\n", | ||
"\n", | ||
" # Iterate over the input list\n", | ||
" for i, inp in enumerate(input):\n", | ||
" new_updates = client.chat.completions.create(\n", | ||
" model=\"gpt-3.5-turbo-16k\",\n", | ||
" messages=[\n", | ||
" {\n", | ||
" \"role\": \"system\",\n", | ||
" \"content\": \"\"\"You are an iterative knowledge graph builder.\n", | ||
" You are given the current state of the graph, and you must append the nodes and edges \n", | ||
" to it Do not procide any duplcates and try to reuse nodes as much as possible.\"\"\",\n", | ||
" },\n", | ||
" {\n", | ||
" \"role\": \"user\",\n", | ||
" \"content\": f\"\"\"Extract any new nodes and edges from the following:\n", | ||
" # Part {i}/{len(input)} of the input:\n", | ||
"\n", | ||
" {inp}\"\"\",\n", | ||
" },\n", | ||
" {\n", | ||
" \"role\": \"user\",\n", | ||
" \"content\": f\"\"\"Here is the current state of the graph:\n", | ||
" {cur_state.model_dump_json(indent=2)}\"\"\",\n", | ||
" },\n", | ||
" ],\n", | ||
" response_model=KnowledgeGraph,\n", | ||
" ) # type: ignore\n", | ||
"\n", | ||
" # Update the current state with the new updates\n", | ||
" cur_state = cur_state.update(new_updates)\n", | ||
"\n", | ||
" # Draw the current state of the graph\n", | ||
" cur_state.visualize_knowledge_graph() \n", | ||
" \n", | ||
" # Return the final state of the KnowledgeGraph\n", | ||
" return cur_state\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iterative generate_graph
function is a significant improvement for handling larger datasets. The use of the update
method to merge the graphs is a good approach. However, ensure that the client.chat.completions.create
method can handle the iterative prompts and that the response_model
parameter is used correctly. Also, the comment in the system
role message has a typo ("procide" should be "provide").
- to it Do not procide any duplcates and try to reuse nodes as much as possible.\"\"\",\n",
+ to it. Do not provide any duplicates and try to reuse nodes as much as possible.\"\"\",\n",
Commitable suggestion
[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
"def generate_graph(input: List[str]) -> KnowledgeGraph:\n", | |
" # Initialize an empty KnowledgeGraph\n", | |
" cur_state = KnowledgeGraph()\n", | |
"\n", | |
" # Iterate over the input list\n", | |
" for i, inp in enumerate(input):\n", | |
" new_updates = client.chat.completions.create(\n", | |
" model=\"gpt-3.5-turbo-16k\",\n", | |
" messages=[\n", | |
" {\n", | |
" \"role\": \"system\",\n", | |
" \"content\": \"\"\"You are an iterative knowledge graph builder.\n", | |
" You are given the current state of the graph, and you must append the nodes and edges \n", | |
" to it Do not procide any duplcates and try to reuse nodes as much as possible.\"\"\",\n", | |
" },\n", | |
" {\n", | |
" \"role\": \"user\",\n", | |
" \"content\": f\"\"\"Extract any new nodes and edges from the following:\n", | |
" # Part {i}/{len(input)} of the input:\n", | |
"\n", | |
" {inp}\"\"\",\n", | |
" },\n", | |
" {\n", | |
" \"role\": \"user\",\n", | |
" \"content\": f\"\"\"Here is the current state of the graph:\n", | |
" {cur_state.model_dump_json(indent=2)}\"\"\",\n", | |
" },\n", | |
" ],\n", | |
" response_model=KnowledgeGraph,\n", | |
" ) # type: ignore\n", | |
"\n", | |
" # Update the current state with the new updates\n", | |
" cur_state = cur_state.update(new_updates)\n", | |
"\n", | |
" # Draw the current state of the graph\n", | |
" cur_state.visualize_knowledge_graph() \n", | |
" \n", | |
" # Return the final state of the KnowledgeGraph\n", | |
" return cur_state\n" | |
"def generate_graph(input: List[str]) -> KnowledgeGraph:\n", | |
" # Initialize an empty KnowledgeGraph\n", | |
" cur_state = KnowledgeGraph()\n", | |
"\n", | |
" # Iterate over the input list\n", | |
" for i, inp in enumerate(input):\n", | |
" new_updates = client.chat.completions.create(\n", | |
" model=\"gpt-3.5-turbo-16k\",\n", | |
" messages=[\n", | |
" {\n", | |
" \"role\": \"system\",\n", | |
" \"content\": \"\"\"You are an iterative knowledge graph builder.\n", | |
" You are given the current state of the graph, and you must append the nodes and edges \n", | |
" to it. Do not provide any duplicates and try to reuse nodes as much as possible.\"\"\",\n", | |
" },\n", | |
" {\n", | |
" \"role\": \"user\",\n", | |
" \"content\": f\"\"\"Extract any new nodes and edges from the following:\n", | |
" # Part {i}/{len(input)} of the input:\n", | |
"\n", | |
" {inp}\"\"\",\n", | |
" },\n", | |
" {\n", | |
" \"role\": \"user\",\n", | |
" \"content\": f\"\"\"Here is the current state of the graph:\n", | |
" {cur_state.model_dump_json(indent=2)}\"\"\",\n", | |
" },\n", | |
" ],\n", | |
" response_model=KnowledgeGraph,\n", | |
" ) # type: ignore\n", | |
"\n", | |
" # Update the current state with the new updates\n", | |
" cur_state = cur_state.update(new_updates)\n", | |
"\n", | |
" # Draw the current state of the graph\n", | |
" cur_state.visualize_knowledge_graph() \n", | |
" \n", | |
" # Return the final state of the KnowledgeGraph\n", | |
" return cur_state\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 6
Configuration used: CodeRabbit UI
Files selected for processing (1)
- tutorials/1.introduction.ipynb (1 hunks)
Additional comments: 12
tutorials/1.introduction.ipynb (12)
1-14: The introduction is clear and sets the stage for the problem that the notebook aims to solve. It's good practice to link to additional resources for users who want more background information.
20-22: The explanation of the problem with working with raw JSON is clear and sets up the rationale for using a library like Pydantic.
26-34: This code block demonstrates the potential inconsistencies in JSON data. It's a good example of the kind of data issues Pydantic can help address.
111-119: This code block effectively demonstrates how to define a Pydantic model and instantiate it with valid data.
186-187: This code block demonstrates Pydantic's validation error messages, which is useful for understanding how Pydantic handles invalid data.
201-232: This section introduces integration with the OpenAI API, which is a practical application of Pydantic models for handling structured data from an external service.
298-313: Here, the notebook demonstrates extending a Pydantic model to include additional fields and shows how to handle more complex data structures. This is a good demonstration of Pydantic's extensibility.
320-327: The explanation of function calling and its use in the context of the OpenAI API is informative and relevant to the notebook's topic.
381-402: This code block demonstrates generating a JSON schema from a Pydantic model, which is a powerful feature for documentation and validation purposes.
443-454: The example of defining nested schemas with Pydantic is excellent, showing the library's capability to handle complex data structures with ease.
461-462: The conclusion ties back to the library
instructor
and its purpose, providing context for the examples given in the notebook.482-499: This final code block shows how to integrate the
instructor
library with the OpenAI SDK, demonstrating a practical application of the concepts discussed in the notebook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 18
Configuration used: CodeRabbit UI
Files selected for processing (5)
- tutorials/1.introduction.ipynb (1 hunks)
- tutorials/2.tips.ipynb (1 hunks)
- tutorials/3.applications-rag.ipynb (1 hunks)
- tutorials/4.knowledge-graphs.ipynb (1 hunks)
- tutorials/5.validation.ipynb (1 hunks)
Files not summarized due to errors (3)
- tutorials/1.introduction.ipynb: Error: Message exceeds token limit
- tutorials/4.knowledge-graphs.ipynb: Error: Message exceeds token limit
- tutorials/5.validation.ipynb: Error: Message exceeds token limit
Additional comments: 31
tutorials/2.tips.ipynb (6)
34-83: The code demonstrates the use of Enums in Pydantic models and how to use them with OpenAI's API. The
House
enum is correctly defined without usingauto()
to ensure the values are meaningful strings rather than integers. TheCharacter
model is then used in a request to the OpenAI API, and the response is dumped usingresp.model_dump()
. This is a good practice for ensuring that the response adheres to the expected schema.86-118: This cell shows an alternative approach using Literals instead of Enums. This is appropriate for cases where the set of values is small and fixed. The code is correct and follows best practices for type hinting in Pydantic.
132-177: The code cell introduces a way to handle arbitrary properties by defining a
Property
model and including a list of these in theCharacter
model. This is a flexible approach that can handle various data without needing to know the exact schema beforehand. The use of a list ofProperty
objects within theCharacter
model is a good example of nested models in Pydantic.258-315: The code cell shows how to define multiple entities within a single model. This is a common pattern when dealing with collections of items in APIs and is correctly implemented here.
[APROVED]329-368: This cell demonstrates defining relationships between entities using lists of references (in this case, user IDs to represent friends). This is a common pattern in data modeling and is well implemented here.
372-511: The final code cell uses Graphviz to visualize the relationships between entities. This is a practical example of how to use Python libraries to create visual representations of data structures. The code correctly checks to avoid duplicating edges by ensuring that the friend ID is greater than the user ID before adding an edge.
tutorials/3.applications-rag.ipynb (16)
7-26: The introduction to RAG models is clear and informative. It provides a good foundation for readers who are new to the concept. The use of an image to illustrate the process is also helpful.
33-55: The section on the limitations of simple RAG models is well-structured and provides concrete examples of the challenges faced when using such models. This sets the stage for the subsequent sections on improving RAG models.
62-72: The explanation of query understanding as a solution to improve RAG models is concise and to the point. The accompanying image likely adds value by visually representing the concept.
86-101: The introduction to the
instructor
library and its integration with the OpenAI API is clear. However, ensure that theOpenAI
class and its methods are up to date with the current OpenAI API.128-131: The
Extraction
model is well-defined using Pydantic, which will help in validating the structured data. The use ofField
with descriptions is a good practice for documentation and clarity.168-203: The code example demonstrates how to use the
instructor
library to create structured outputs from a text chunk. The output is clearly printed, showing the structured data. Ensure that themodel="gpt-4-1106-preview"
is a valid model identifier and that theresponse_model
parameter is correctly implemented in theinstructor
library.210-210: The explanation of how embedding summaries, hypothetical questions, and keywords can improve the search results is insightful and demonstrates a practical application of structured output.
217-236: The introduction of temporal context to queries is a good example of how structured output can be used to enhance query understanding. The
DateRange
andQuery
models are well-defined.243-252: The explanation of how the structured query can be used to optimize backend search results is clear and provides a good use case for the models defined earlier.
275-290: The code example for adding temporal context to a query is well-structured. However, ensure that the
model="gpt-4-1106-preview"
is still a valid model identifier and that theresponse_model
parameter is correctly implemented in theinstructor
library.353-359: The explanation of how personal assistants can benefit from structured output to handle parallel processing and fetch information from multiple backends is insightful and sets the stage for a practical example.
[APROVED]368-379: The
SearchClient
andRetrival
models are well-defined using Pydantic. This structured approach will help in validating the data and ensuring that the queries are well-formed.424-432: The code example for dispatching queries to different backends using structured output is clear. Ensure that the
model="gpt-4-1106-preview"
is still a valid model identifier and that theresponse_model
parameter is correctly implemented in theinstructor
library.504-517: The section on decomposing questions into sub-questions is a complex but valuable example of using structured output to enhance query understanding. It shows how to break down a problem into smaller, manageable parts.
568-588: The
Question
andQueryPlan
models are well-defined, and the code example demonstrates how to use structured output to decompose a complex query into sub-questions. Ensure that themodel="gpt-4-1106-preview"
is still a valid model identifier and that theresponse_model
parameter is correctly implemented in theinstructor
library.595-597: The closing remarks summarize the section well and highlight the potential of structured outputs in leveraging language models for system components.
tutorials/4.knowledge-graphs.ipynb (4)
58-61: The
instructor
library is being used to patch theOpenAI
client. This is a good use of a wrapper to extend functionality, but it's important to ensure that theinstructor
library is actively maintained and compatible with the version of theOpenAI
client being used. If theinstructor
library modifies the behavior of theOpenAI
client, it could potentially introduce unexpected side effects or bugs.127-143: The
visualize_knowledge_graph
method uses thegraphviz
library to visualize the knowledge graph. This is a good approach for generating visual representations of graphs. However, ensure that thegraphviz
library is installed in the environment where this notebook will be run, as it is an external dependency.896-903: The example use case provided at the end of the notebook demonstrates how to use the
generate_graph
function with a list of text chunks. This is a good demonstration of the iterative graph generation process. However, ensure that thegenerate_graph
function is fully tested and handles various input cases correctly, especially with regard to error handling and API response parsing mentioned earlier.917-972: The conclusion provides a summary of what was covered in the tutorial and suggests exercises for the reader. This is a good educational practice as it encourages the reader to apply what they've learned. However, ensure that the examples provided in the exercises are feasible with the current implementation of the
KnowledgeGraph
class and related functions. If additional functionality is required to complete these exercises, consider providing that in the tutorial or as supplementary material.tutorials/5.validation.ipynb (5)
16-20: The explanation of Pydantic's role in validation and the
instructor
library's extension of its capabilities is clear and informative. It's important to ensure that the links to external documentation are kept up-to-date to maintain the usefulness of this tutorial.65-68: The example validator function
name_must_contain_space
is a good demonstration of custom validation logic. It's simple and demonstrates the concept effectively.367-368: The use of Pydantic's
Field
constraints to limit the length of a message is a good example of using built-in validators. This is a simple and effective way to enforce message length constraints.417-431: The use of context in the
AnswerWithCitation
model is a sophisticated example of validation that ensures the citation is present in the provided text chunk. This is a good practice for ensuring the accuracy of referenced information.796-805: The conclusion provides a good summary of the tutorial's content and offers a to-do list for further exploration. This is a great way to encourage continued learning and application of the concepts covered in the tutorial.
Overall, the tutorial content is well-structured and covers important aspects of data validation and integration with OpenAI's API. The use of Pydantic and the
instructor
library is well-explained, and the examples are practical and relevant. It's important to ensure that the code is robust, handles edge cases, and performs well under different conditions. Additionally, the tutorial should encourage best practices such as error handling, caching, and performance optimization.
"We have a `name` field, which is a string, and an `age` field, which is an integer. However, if we were to load this into a dictionary, we would have no way of knowing if the data is valid. For example, we could have a string for the age, or we could have a float for the age. We could also have a string for the name, or we could have a list for the name." | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explanation here is good, but it could be enhanced by explicitly stating that dictionaries in Python do not enforce types or schemas, which is why Pydantic is useful.
"source": [ | ||
"for obj in data:\n", | ||
" name = obj.get(\"first_name\")\n", | ||
" age = obj.get(\"age\")\n", | ||
" print(f\"{name} is {age}\")\n", | ||
" print(f\"Next year he will be {age+1} years old\")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code block is meant to demonstrate the failure that occurs when types are not enforced. However, the error message in the traceback (lines 59-66) indicates a type error due to string concatenation with an integer, which is a different issue than what's being discussed. It would be more illustrative to show an error that directly results from incorrect types associated with the JSON keys.
"## Pydantic to the rescue\n", | ||
"\n", | ||
"Pydantic is a library that allows us to define data structures, and then validate them. It also allows us to define data structures." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The introduction to Pydantic is succinct and to the point. It might be beneficial to mention that Pydantic can also be used for parsing and exporting data, which is a significant part of its functionality.
"person = Person.model_validate({\"name\": \"Sam\", \"age\": \"30\"})\n", | ||
"person" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of Person.model_validate
is not standard Pydantic usage. Typically, you would instantiate a model directly with Person(**data)
. If model_validate
is a custom method, it should be clearly documented, or standard Pydantic practices should be used for clarity.
"assert person.name == \"Sam\"\n", | ||
"assert person.age == 20" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion here is incorrect as it checks person.age == 20
while the instantiated person
has an age of 30. This will raise an AssertionError
.
- assert person.age == 20
+ assert person.age == 30
Commitable suggestion
[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
"assert person.name == \"Sam\"\n", | |
"assert person.age == 20" | |
"assert person.name == \"Sam\"\n", | |
"assert person.age == 30" |
"def validate_chain_of_thought(values):\n", | ||
" chain_of_thought = values[\"chain_of_thought\"]\n", | ||
" answer = values[\"answer\"]\n", | ||
" resp = client.chat.completions.create(\n", | ||
" model=\"gpt-4-1106-preview\",\n", | ||
" messages=[\n", | ||
" {\n", | ||
" \"role\": \"system\",\n", | ||
" \"content\": \"You are a validator. Determine if the value follows from the statement. If it is not, explain why.\",\n", | ||
" },\n", | ||
" {\n", | ||
" \"role\": \"user\",\n", | ||
" \"content\": f\"Verify that `{answer}` follows the chain of thought: {chain_of_thought}\",\n", | ||
" },\n", | ||
" ],\n", | ||
" response_model=Validation,\n", | ||
" )\n", | ||
" if not resp.is_valid:\n", | ||
" raise ValueError(resp.error_message)\n", | ||
" return values" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validate_chain_of_thought
function demonstrates an advanced use case of integrating LLMs into the validation process. It's important to ensure that the LLM's responses are interpreted correctly and that the error messages are clear and actionable.
"class QuestionAnswer(BaseModel):\n", | ||
" question: str\n", | ||
" answer: str\n", | ||
"\n", | ||
"question = \"What is the meaning of life?\"\n", | ||
"context = \"The according to the devil the meaning of life is a life of sin and debauchery.\"\n", | ||
"\n", | ||
"\n", | ||
"resp = client.chat.completions.create(\n", | ||
" model=\"gpt-4-1106-preview\",\n", | ||
" response_model=QuestionAnswer,\n", | ||
" messages=[\n", | ||
" {\n", | ||
" \"role\": \"system\",\n", | ||
" \"content\": \"You are a system that answers questions based on the context. answer exactly what the question asks using the context.\",\n", | ||
" },\n", | ||
" {\n", | ||
" \"role\": \"user\",\n", | ||
" \"content\": f\"using the context: `{context}`\\n\\nAnswer the following question: `{question}`\",\n", | ||
" },\n", | ||
" ],\n", | ||
")\n", | ||
"\n", | ||
"resp.answer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example of using the response_model
parameter with the OpenAI API is a good demonstration of how to integrate structured output with the API. It's important to ensure that the response_model
is designed to handle all possible responses from the API, including errors and edge cases.
"from pydantic import BeforeValidator\n", | ||
"\n", | ||
"class QuestionAnswer(BaseModel):\n", | ||
" question: str\n", | ||
" answer: Annotated[\n", | ||
" str,\n", | ||
" BeforeValidator(\n", | ||
" llm_validator(\"don't say objectionable things\")\n", | ||
" ),\n", | ||
" ]\n", | ||
"\n", | ||
"resp = client.chat.completions.create(\n", | ||
" model=\"gpt-3.5-turbo\",\n", | ||
" response_model=QuestionAnswer,\n", | ||
" max_retries=2,\n", | ||
" messages=[\n", | ||
" {\n", | ||
" \"role\": \"system\",\n", | ||
" \"content\": \"You are a system that answers questions based on the context. answer exactly what the question asks using the context.\",\n", | ||
" },\n", | ||
" {\n", | ||
" \"role\": \"user\",\n", | ||
" \"content\": f\"using the context: `{context}`\\n\\nAnswer the following question: `{question}`\",\n", | ||
" },\n", | ||
" ],\n", | ||
")\n", | ||
"\n", | ||
"resp.answer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of BeforeValidator
and llm_validator
to ensure that responses do not contain objectionable content is a critical aspect of responsible AI deployment. It's important to test these validators thoroughly to ensure they are effective and do not over-filter content.
"resp = client.chat.completions.create(\n", | ||
" model=\"gpt-3.5-turbo\",\n", | ||
" messages=[\n", | ||
" {\"role\": \"user\", \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesturday` into json today is {datetime.date.today()}\"},\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo in the word "yesturday" which should be corrected to "yesterday" to ensure the date is understood correctly.
- \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesturday` into json today is {datetime.date.today()}\"},\n",
+ \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesterday` into json today is {datetime.date.today()}\"},\n",
Commitable suggestion
[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
" {\"role\": \"user\", \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesturday` into json today is {datetime.date.today()}\"},\n", | |
" {\"role\": \"user\", \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesterday` into json today is {datetime.date.today()}\"},\n", |
" \"content\": f\"\"\"\n", | ||
" Today is {datetime.date.today()} \n", | ||
"\n", | ||
" Extract `Jason Liu is thirty years old his birthday is yesturday` \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "yesturday" is misspelled and should be corrected to "yesterday" for proper date parsing.
- Extract `Jason Liu is thirty years old his birthday is yesturday`
+ Extract `Jason Liu is thirty years old his birthday is yesterday`
Commitable suggestion
[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
" Extract `Jason Liu is thirty years old his birthday is yesturday` \n", | |
" Extract `Jason Liu is thirty years old his birthday is yesterday` \n", |
Summary by CodeRabbit
New Features
Documentation
openai<1.0.0
andinstructor
libraries.UserExtract
class.response_model
usage.Refactor
apatch
to the public interface ininstructor
library for async support.wrap_chatcompletion
function to accept anis_async
parameter.Tests
pytest.mark.asyncio
.UserExtract
model and async OpenAI client interactions.Chores