Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tutorials #154

Merged
merged 14 commits into from
Nov 19, 2023
Merged

Add tutorials #154

merged 14 commits into from
Nov 19, 2023

Conversation

jxnl
Copy link
Collaborator

@jxnl jxnl commented Nov 7, 2023

Summary by CodeRabbit

  • New Features

    • Introduced a new section on requesting JSON data from OpenAI.
    • Added examples of using Pydantic for data validation and schema definition.
    • Showcased the integration of OpenAI's ChatCompletion API for structured data extraction.
    • Demonstrated the creation of nested complex schemas with Pydantic.
  • Documentation

    • Updated README with instructions for using openai<1.0.0 and instructor libraries.
    • Added usage examples for async clients and new UserExtract class.
    • Enhanced project documentation with async client notes and response_model usage.
    • Included a section on Pydantic validation and error handling in documentation.
  • Refactor

    • Added apatch to the public interface in instructor library for async support.
    • Modified wrap_chatcompletion function to accept an is_async parameter.
  • Tests

    • Created new async test functions and integrated pytest.mark.asyncio.
    • Added tests for UserExtract model and async OpenAI client interactions.
  • Chores

    • Updated tutorials with practical examples of structured output for RAG models.
    • Enhanced knowledge graph handling with new methods and classes in tutorials.
    • Provided examples of accessing original responses in API calls in documentation.

Copy link
Contributor

coderabbitai bot commented Nov 7, 2023

Walkthrough

The updates encompass the integration of Pydantic for structured data validation and schema definition, enhancements to the OpenAI SDK via the instructor library, and the introduction of async client capabilities. Tutorials have been enriched with practical examples of applying structured output to RAG models, and the documentation now includes guidance on using async clients and accessing original responses.

Changes

File Path Change Summary
tutorials/1.introduction.ipynb Added sections on Pydantic for data validation/schema definition and OpenAI's ChatCompletion API integration.
tutorials/2.applications-rag.ipynb Introduced content on structured output application to RAG models and practical examples.
README.md Updated usage instructions for openai and instructor libraries, including async client examples.
docs/index.md Updated documentation with async client usage and Pydantic validation examples.
instructor/init.py Added apatch to public interface.
instructor/patch.py Added is_async parameter to wrap_chatcompletion and apatch function.
tests/test_patch.py Added async test functions and assertions for UserExtract model.
tutorials/2.tips.ipynb Included tips on schema engineering and classification methods using Enums and Literals.
tutorials/3.applications-rag.ipynb Added content on improving RAG models and examples using instructor library.
tutorials/4.knowledge-graphs.ipynb Enhanced knowledge graph handling with new classes and methods.
tutorials/5.validation.ipynb Implemented validation using Pydantic and instructor library with new classes and validation scenarios.

🐇 In the code's weave, through autumn's bright leaves, 🍂
We hop to the patch, async in dispatch,
Structured and neat, our schemas complete,
With Pydantic's charm, we code without alarm. 🌟


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between a8a4b70 and b4e2126.
Files selected for processing (1)
  • tutorials/1.introduction.ipynb (1 hunks)
Files not reviewed due to errors (1)
  • tutorials/1.introduction.ipynb (Error: diff too large)

@jxnl jxnl added the documentation Improvements or additions to documentation label Nov 8, 2023
@jxnl jxnl changed the title Add tutorials Add tutorials Nov 8, 2023
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between b4e2126 and 144c792.
Files ignored due to filter (1)
  • pyproject.toml
Files selected for processing (1)
  • tutorials/2.applications-rag.ipynb (1 hunks)
Files not reviewed due to errors (1)
  • tutorials/2.applications-rag.ipynb (Error: diff too large)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 144c792 and 1ec9114.
Files ignored due to filter (1)
  • pyproject.toml
Files selected for processing (7)
  • README.md (1 hunks)
  • docs/index.md (8 hunks)
  • instructor/init.py (2 hunks)
  • instructor/patch.py (2 hunks)
  • tests/test_patch.py (1 hunks)
  • tutorials/1.introduction.ipynb (1 hunks)
  • tutorials/2.applications-rag.ipynb (1 hunks)
Files not reviewed due to errors (1)
  • tutorials/1.introduction.ipynb (Error: diff too large)
Files skipped from review due to trivial changes (2)
  • docs/index.md
  • tutorials/2.applications-rag.ipynb
Additional comments: 10
instructor/__init__.py (2)
  • 1-4: The new import apatch from the .patch module is added. Ensure that the function or variable apatch is defined in the .patch module.

  • 11-17: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [6-17]

The apatch is added to the __all__ list, which means it is now part of the public interface of the instructor module. Make sure that this is the intended behavior and that the usage of apatch is documented.

README.md (3)
  • 45-45: The assertion is correct and ensures that the age is correctly extracted.

  • 48-62: The instructions for using openai<1.0.0 are clear and provide the necessary steps for patching a global client.

  • 65-88: The instructions for using async clients are clear and provide the necessary steps for patching an async client. The example provided is also correct.

instructor/patch.py (1)
  • 212-214: The wrap_chatcompletion function is called with is_async=True only for client.chat.completions.create. Ensure that other async methods of the client are also wrapped with is_async=True if needed.
tests/test_patch.py (4)
  • 1-5: Imports are correctly placed and organized according to PEP8 guidelines.

  • 8-9: The OpenAI and AsyncOpenAI clients are correctly patched with the instructor module.

  • 12-30: The test_async_runmodel function is correctly defined as an async function and marked with pytest.mark.asyncio. The UserExtract model is defined correctly and used as the response_model in the aclient.chat.completions.create call. The assertions are correctly checking the instance type of the model, the value of the name attribute, and the existence of the _raw_response attribute.

  • 32-32: The test_runmodel function is not implemented. Ensure to add the implementation or remove the function if it's not needed.

@@ -134,9 +134,7 @@ def retry_sync(
raise e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that the exception e is logged for debugging purposes before it's raised again.

- raise e
+ logging.exception(e)
+ raise e

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Make sure it correctly replaces the highlighted code, has no missing lines and indentaion issues.

Suggested change
raise e
logging.exception(e)
raise e

Comment on lines 137 to 140
def wrap_chatcompletion(func: Callable, is_async: bool = None) -> Callable:
@wraps(func)
async def new_chatcompletion_async(
response_model=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_async parameter is not documented in the function docstring. Add a description for is_async in the function docstring.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 1ec9114 and 9905497.
Files selected for processing (3)
  • tutorials/1.introduction.ipynb (1 hunks)
  • tutorials/2.tips.ipynb (1 hunks)
  • tutorials/3.applications-rag.ipynb (1 hunks)
Files not reviewed due to errors (3)
  • tutorials/1.introduction.ipynb (Error: diff too large)
  • tutorials/2.tips.ipynb (Error: diff too large)
  • tutorials/3.applications-rag.ipynb (Error: diff too large)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 9905497 and 075d6b9.
Files selected for processing (1)
  • tutorials/4.knowledge-graphs.ipynb (1 hunks)
Files not summarized due to errors (1)
  • tutorials/4.knowledge-graphs.ipynb: Error: Message exceeds token limit
Files not reviewed due to errors (1)
  • tutorials/4.knowledge-graphs.ipynb (Error: diff too large)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 075d6b9 and e7b5992.
Files selected for processing (1)
  • docs/index.md (1 hunks)
Files skipped from review due to trivial changes (1)
  • docs/index.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between e7b5992 and 37d82ae.
Files selected for processing (1)
  • tutorials/5.validation.ipynb (1 hunks)
Files not reviewed due to errors (1)
  • tutorials/5.validation.ipynb (Error: diff too large)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 37d82ae and ea6de65.
Files selected for processing (1)
  • tutorials/4.knowledge-graphs.ipynb (1 hunks)
Additional comments: 10
tutorials/4.knowledge-graphs.ipynb (10)
  • 7-28: The introduction to knowledge graphs is clear and informative, providing a good foundation for the rest of the tutorial.

  • 49-50: The explanation of the libraries used is concise and gives the reader a clear understanding of their purpose in the tutorial.

  • 67-70: The import statements and the client initialization look correct, but it's important to ensure that the instructor library's patch function is compatible with the OpenAI class from the openai library.

  • 97-109: The Node and Edge classes are well-defined using Pydantic, which will help with data validation and error handling.

  • 139-152: The KnowledgeGraph class and its visualize_knowledge_graph method are well implemented. The use of the graphviz library for visualization is appropriate.

  • 174-184: The generate_graph function is designed to interact with an AI model to generate a knowledge graph. Ensure that the client.chat.completions.create method exists and that the response_model parameter correctly handles the conversion to a KnowledgeGraph object.

  • 370-385: The addition of the __hash__ method to the Node and Edge classes is a good practice for handling duplicates, especially when using these objects in sets.

  • 395-415: The KnowledgeGraph class has been updated to make the nodes and edges fields optional, which adds flexibility. The update method is a good addition for merging graphs. However, ensure that the deduplication logic in the update method works as intended, as combining lists and converting to a set may not work if the objects are not hashable or if their equality is not defined correctly.

  • 514-521: The example use case for generating a knowledge graph from text chunks is a practical demonstration of the iterative graph generation process.

  • 535-590: The conclusion provides a good summary of the tutorial's content and suggests further exercises for the reader, which is a great way to encourage practice and deeper understanding.

"metadata": {},
"outputs": [],
"source": [
"!pip install instructor graphviz --quiet"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installation command for the required libraries is correct, but it's always a good practice to specify the version of the libraries to ensure compatibility.

Comment on lines 449 to 487
"def generate_graph(input: List[str]) -> KnowledgeGraph:\n",
" # Initialize an empty KnowledgeGraph\n",
" cur_state = KnowledgeGraph()\n",
"\n",
" # Iterate over the input list\n",
" for i, inp in enumerate(input):\n",
" new_updates = client.chat.completions.create(\n",
" model=\"gpt-3.5-turbo-16k\",\n",
" messages=[\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": \"\"\"You are an iterative knowledge graph builder.\n",
" You are given the current state of the graph, and you must append the nodes and edges \n",
" to it Do not procide any duplcates and try to reuse nodes as much as possible.\"\"\",\n",
" },\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": f\"\"\"Extract any new nodes and edges from the following:\n",
" # Part {i}/{len(input)} of the input:\n",
"\n",
" {inp}\"\"\",\n",
" },\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": f\"\"\"Here is the current state of the graph:\n",
" {cur_state.model_dump_json(indent=2)}\"\"\",\n",
" },\n",
" ],\n",
" response_model=KnowledgeGraph,\n",
" ) # type: ignore\n",
"\n",
" # Update the current state with the new updates\n",
" cur_state = cur_state.update(new_updates)\n",
"\n",
" # Draw the current state of the graph\n",
" cur_state.visualize_knowledge_graph() \n",
" \n",
" # Return the final state of the KnowledgeGraph\n",
" return cur_state\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iterative generate_graph function is a significant improvement for handling larger datasets. The use of the update method to merge the graphs is a good approach. However, ensure that the client.chat.completions.create method can handle the iterative prompts and that the response_model parameter is used correctly. Also, the comment in the system role message has a typo ("procide" should be "provide").

-                    to it Do not procide any duplcates and try to reuse nodes as much as possible.\"\"\",\n",
+                    to it. Do not provide any duplicates and try to reuse nodes as much as possible.\"\"\",\n",

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
"def generate_graph(input: List[str]) -> KnowledgeGraph:\n",
" # Initialize an empty KnowledgeGraph\n",
" cur_state = KnowledgeGraph()\n",
"\n",
" # Iterate over the input list\n",
" for i, inp in enumerate(input):\n",
" new_updates = client.chat.completions.create(\n",
" model=\"gpt-3.5-turbo-16k\",\n",
" messages=[\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": \"\"\"You are an iterative knowledge graph builder.\n",
" You are given the current state of the graph, and you must append the nodes and edges \n",
" to it Do not procide any duplcates and try to reuse nodes as much as possible.\"\"\",\n",
" },\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": f\"\"\"Extract any new nodes and edges from the following:\n",
" # Part {i}/{len(input)} of the input:\n",
"\n",
" {inp}\"\"\",\n",
" },\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": f\"\"\"Here is the current state of the graph:\n",
" {cur_state.model_dump_json(indent=2)}\"\"\",\n",
" },\n",
" ],\n",
" response_model=KnowledgeGraph,\n",
" ) # type: ignore\n",
"\n",
" # Update the current state with the new updates\n",
" cur_state = cur_state.update(new_updates)\n",
"\n",
" # Draw the current state of the graph\n",
" cur_state.visualize_knowledge_graph() \n",
" \n",
" # Return the final state of the KnowledgeGraph\n",
" return cur_state\n"
"def generate_graph(input: List[str]) -> KnowledgeGraph:\n",
" # Initialize an empty KnowledgeGraph\n",
" cur_state = KnowledgeGraph()\n",
"\n",
" # Iterate over the input list\n",
" for i, inp in enumerate(input):\n",
" new_updates = client.chat.completions.create(\n",
" model=\"gpt-3.5-turbo-16k\",\n",
" messages=[\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": \"\"\"You are an iterative knowledge graph builder.\n",
" You are given the current state of the graph, and you must append the nodes and edges \n",
" to it. Do not provide any duplicates and try to reuse nodes as much as possible.\"\"\",\n",
" },\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": f\"\"\"Extract any new nodes and edges from the following:\n",
" # Part {i}/{len(input)} of the input:\n",
"\n",
" {inp}\"\"\",\n",
" },\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": f\"\"\"Here is the current state of the graph:\n",
" {cur_state.model_dump_json(indent=2)}\"\"\",\n",
" },\n",
" ],\n",
" response_model=KnowledgeGraph,\n",
" ) # type: ignore\n",
"\n",
" # Update the current state with the new updates\n",
" cur_state = cur_state.update(new_updates)\n",
"\n",
" # Draw the current state of the graph\n",
" cur_state.visualize_knowledge_graph() \n",
" \n",
" # Return the final state of the KnowledgeGraph\n",
" return cur_state\n"

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 6

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between ea6de65 and 9774df5.
Files selected for processing (1)
  • tutorials/1.introduction.ipynb (1 hunks)
Additional comments: 12
tutorials/1.introduction.ipynb (12)
  • 1-14: The introduction is clear and sets the stage for the problem that the notebook aims to solve. It's good practice to link to additional resources for users who want more background information.

  • 20-22: The explanation of the problem with working with raw JSON is clear and sets up the rationale for using a library like Pydantic.

  • 26-34: This code block demonstrates the potential inconsistencies in JSON data. It's a good example of the kind of data issues Pydantic can help address.

  • 111-119: This code block effectively demonstrates how to define a Pydantic model and instantiate it with valid data.

  • 186-187: This code block demonstrates Pydantic's validation error messages, which is useful for understanding how Pydantic handles invalid data.

  • 201-232: This section introduces integration with the OpenAI API, which is a practical application of Pydantic models for handling structured data from an external service.

  • 298-313: Here, the notebook demonstrates extending a Pydantic model to include additional fields and shows how to handle more complex data structures. This is a good demonstration of Pydantic's extensibility.

  • 320-327: The explanation of function calling and its use in the context of the OpenAI API is informative and relevant to the notebook's topic.

  • 381-402: This code block demonstrates generating a JSON schema from a Pydantic model, which is a powerful feature for documentation and validation purposes.

  • 443-454: The example of defining nested schemas with Pydantic is excellent, showing the library's capability to handle complex data structures with ease.

  • 461-462: The conclusion ties back to the library instructor and its purpose, providing context for the examples given in the notebook.

  • 482-499: This final code block shows how to integrate the instructor library with the OpenAI SDK, demonstrating a practical application of the concepts discussed in the notebook.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 9774df5 and 90d8a8c.
Files selected for processing (1)
  • docs/index.md (1 hunks)
Files skipped from review due to trivial changes (1)
  • docs/index.md

@jxnl jxnl merged commit 9269f23 into main Nov 19, 2023
@jxnl jxnl deleted the tutorials branch November 19, 2023 01:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 18

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 90d8a8c and d02111d.
Files selected for processing (5)
  • tutorials/1.introduction.ipynb (1 hunks)
  • tutorials/2.tips.ipynb (1 hunks)
  • tutorials/3.applications-rag.ipynb (1 hunks)
  • tutorials/4.knowledge-graphs.ipynb (1 hunks)
  • tutorials/5.validation.ipynb (1 hunks)
Files not summarized due to errors (3)
  • tutorials/1.introduction.ipynb: Error: Message exceeds token limit
  • tutorials/4.knowledge-graphs.ipynb: Error: Message exceeds token limit
  • tutorials/5.validation.ipynb: Error: Message exceeds token limit
Additional comments: 31
tutorials/2.tips.ipynb (6)
  • 34-83: The code demonstrates the use of Enums in Pydantic models and how to use them with OpenAI's API. The House enum is correctly defined without using auto() to ensure the values are meaningful strings rather than integers. The Character model is then used in a request to the OpenAI API, and the response is dumped using resp.model_dump(). This is a good practice for ensuring that the response adheres to the expected schema.

  • 86-118: This cell shows an alternative approach using Literals instead of Enums. This is appropriate for cases where the set of values is small and fixed. The code is correct and follows best practices for type hinting in Pydantic.

  • 132-177: The code cell introduces a way to handle arbitrary properties by defining a Property model and including a list of these in the Character model. This is a flexible approach that can handle various data without needing to know the exact schema beforehand. The use of a list of Property objects within the Character model is a good example of nested models in Pydantic.

  • 258-315: The code cell shows how to define multiple entities within a single model. This is a common pattern when dealing with collections of items in APIs and is correctly implemented here.
    [APROVED]

  • 329-368: This cell demonstrates defining relationships between entities using lists of references (in this case, user IDs to represent friends). This is a common pattern in data modeling and is well implemented here.

  • 372-511: The final code cell uses Graphviz to visualize the relationships between entities. This is a practical example of how to use Python libraries to create visual representations of data structures. The code correctly checks to avoid duplicating edges by ensuring that the friend ID is greater than the user ID before adding an edge.

tutorials/3.applications-rag.ipynb (16)
  • 7-26: The introduction to RAG models is clear and informative. It provides a good foundation for readers who are new to the concept. The use of an image to illustrate the process is also helpful.

  • 33-55: The section on the limitations of simple RAG models is well-structured and provides concrete examples of the challenges faced when using such models. This sets the stage for the subsequent sections on improving RAG models.

  • 62-72: The explanation of query understanding as a solution to improve RAG models is concise and to the point. The accompanying image likely adds value by visually representing the concept.

  • 86-101: The introduction to the instructor library and its integration with the OpenAI API is clear. However, ensure that the OpenAI class and its methods are up to date with the current OpenAI API.

  • 128-131: The Extraction model is well-defined using Pydantic, which will help in validating the structured data. The use of Field with descriptions is a good practice for documentation and clarity.

  • 168-203: The code example demonstrates how to use the instructor library to create structured outputs from a text chunk. The output is clearly printed, showing the structured data. Ensure that the model="gpt-4-1106-preview" is a valid model identifier and that the response_model parameter is correctly implemented in the instructor library.

  • 210-210: The explanation of how embedding summaries, hypothetical questions, and keywords can improve the search results is insightful and demonstrates a practical application of structured output.

  • 217-236: The introduction of temporal context to queries is a good example of how structured output can be used to enhance query understanding. The DateRange and Query models are well-defined.

  • 243-252: The explanation of how the structured query can be used to optimize backend search results is clear and provides a good use case for the models defined earlier.

  • 275-290: The code example for adding temporal context to a query is well-structured. However, ensure that the model="gpt-4-1106-preview" is still a valid model identifier and that the response_model parameter is correctly implemented in the instructor library.

  • 353-359: The explanation of how personal assistants can benefit from structured output to handle parallel processing and fetch information from multiple backends is insightful and sets the stage for a practical example.
    [APROVED]

  • 368-379: The SearchClient and Retrival models are well-defined using Pydantic. This structured approach will help in validating the data and ensuring that the queries are well-formed.

  • 424-432: The code example for dispatching queries to different backends using structured output is clear. Ensure that the model="gpt-4-1106-preview" is still a valid model identifier and that the response_model parameter is correctly implemented in the instructor library.

  • 504-517: The section on decomposing questions into sub-questions is a complex but valuable example of using structured output to enhance query understanding. It shows how to break down a problem into smaller, manageable parts.

  • 568-588: The Question and QueryPlan models are well-defined, and the code example demonstrates how to use structured output to decompose a complex query into sub-questions. Ensure that the model="gpt-4-1106-preview" is still a valid model identifier and that the response_model parameter is correctly implemented in the instructor library.

  • 595-597: The closing remarks summarize the section well and highlight the potential of structured outputs in leveraging language models for system components.

tutorials/4.knowledge-graphs.ipynb (4)
  • 58-61: The instructor library is being used to patch the OpenAI client. This is a good use of a wrapper to extend functionality, but it's important to ensure that the instructor library is actively maintained and compatible with the version of the OpenAI client being used. If the instructor library modifies the behavior of the OpenAI client, it could potentially introduce unexpected side effects or bugs.

  • 127-143: The visualize_knowledge_graph method uses the graphviz library to visualize the knowledge graph. This is a good approach for generating visual representations of graphs. However, ensure that the graphviz library is installed in the environment where this notebook will be run, as it is an external dependency.

  • 896-903: The example use case provided at the end of the notebook demonstrates how to use the generate_graph function with a list of text chunks. This is a good demonstration of the iterative graph generation process. However, ensure that the generate_graph function is fully tested and handles various input cases correctly, especially with regard to error handling and API response parsing mentioned earlier.

  • 917-972: The conclusion provides a summary of what was covered in the tutorial and suggests exercises for the reader. This is a good educational practice as it encourages the reader to apply what they've learned. However, ensure that the examples provided in the exercises are feasible with the current implementation of the KnowledgeGraph class and related functions. If additional functionality is required to complete these exercises, consider providing that in the tutorial or as supplementary material.

tutorials/5.validation.ipynb (5)
  • 16-20: The explanation of Pydantic's role in validation and the instructor library's extension of its capabilities is clear and informative. It's important to ensure that the links to external documentation are kept up-to-date to maintain the usefulness of this tutorial.

  • 65-68: The example validator function name_must_contain_space is a good demonstration of custom validation logic. It's simple and demonstrates the concept effectively.

  • 367-368: The use of Pydantic's Field constraints to limit the length of a message is a good example of using built-in validators. This is a simple and effective way to enforce message length constraints.

  • 417-431: The use of context in the AnswerWithCitation model is a sophisticated example of validation that ensures the citation is present in the provided text chunk. This is a good practice for ensuring the accuracy of referenced information.

  • 796-805: The conclusion provides a good summary of the tutorial's content and offers a to-do list for further exploration. This is a great way to encourage continued learning and application of the concepts covered in the tutorial.

Overall, the tutorial content is well-structured and covers important aspects of data validation and integration with OpenAI's API. The use of Pydantic and the instructor library is well-explained, and the examples are practical and relevant. It's important to ensure that the code is robust, handles edge cases, and performs well under different conditions. Additionally, the tutorial should encourage best practices such as error handling, caching, and performance optimization.

Comment on lines +41 to +42
"We have a `name` field, which is a string, and an `age` field, which is an integer. However, if we were to load this into a dictionary, we would have no way of knowing if the data is valid. For example, we could have a string for the age, or we could have a float for the age. We could also have a string for the name, or we could have a list for the name."
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation here is good, but it could be enhanced by explicitly stating that dictionaries in Python do not enforce types or schemas, which is why Pydantic is useful.

Comment on lines +70 to +75
"source": [
"for obj in data:\n",
" name = obj.get(\"first_name\")\n",
" age = obj.get(\"age\")\n",
" print(f\"{name} is {age}\")\n",
" print(f\"Next year he will be {age+1} years old\")"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code block is meant to demonstrate the failure that occurs when types are not enforced. However, the error message in the traceback (lines 59-66) indicates a type error due to string concatenation with an integer, which is a different issue than what's being discussed. It would be more illustrative to show an error that directly results from incorrect types associated with the JSON keys.

Comment on lines +89 to +91
"## Pydantic to the rescue\n",
"\n",
"Pydantic is a library that allows us to define data structures, and then validate them. It also allows us to define data structures."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The introduction to Pydantic is succinct and to the point. It might be beneficial to mention that Pydantic can also be used for parsing and exporting data, which is a significant part of its functionality.

Comment on lines +140 to +141
"person = Person.model_validate({\"name\": \"Sam\", \"age\": \"30\"})\n",
"person"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of Person.model_validate is not standard Pydantic usage. Typically, you would instantiate a model directly with Person(**data). If model_validate is a custom method, it should be clearly documented, or standard Pydantic practices should be used for clarity.

Comment on lines +162 to +163
"assert person.name == \"Sam\"\n",
"assert person.age == 20"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion here is incorrect as it checks person.age == 20 while the instantiated person has an age of 30. This will raise an AssertionError.

- assert person.age == 20
+ assert person.age == 30

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
"assert person.name == \"Sam\"\n",
"assert person.age == 20"
"assert person.name == \"Sam\"\n",
"assert person.age == 30"

Comment on lines +593 to +612
"def validate_chain_of_thought(values):\n",
" chain_of_thought = values[\"chain_of_thought\"]\n",
" answer = values[\"answer\"]\n",
" resp = client.chat.completions.create(\n",
" model=\"gpt-4-1106-preview\",\n",
" messages=[\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": \"You are a validator. Determine if the value follows from the statement. If it is not, explain why.\",\n",
" },\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": f\"Verify that `{answer}` follows the chain of thought: {chain_of_thought}\",\n",
" },\n",
" ],\n",
" response_model=Validation,\n",
" )\n",
" if not resp.is_valid:\n",
" raise ValueError(resp.error_message)\n",
" return values"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validate_chain_of_thought function demonstrates an advanced use case of integrating LLMs into the validation process. It's important to ensure that the LLM's responses are interpreted correctly and that the error messages are clear and actionable.

Comment on lines +710 to +733
"class QuestionAnswer(BaseModel):\n",
" question: str\n",
" answer: str\n",
"\n",
"question = \"What is the meaning of life?\"\n",
"context = \"The according to the devil the meaning of life is a life of sin and debauchery.\"\n",
"\n",
"\n",
"resp = client.chat.completions.create(\n",
" model=\"gpt-4-1106-preview\",\n",
" response_model=QuestionAnswer,\n",
" messages=[\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": \"You are a system that answers questions based on the context. answer exactly what the question asks using the context.\",\n",
" },\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": f\"using the context: `{context}`\\n\\nAnswer the following question: `{question}`\",\n",
" },\n",
" ],\n",
")\n",
"\n",
"resp.answer"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example of using the response_model parameter with the OpenAI API is a good demonstration of how to integrate structured output with the API. It's important to ensure that the response_model is designed to handle all possible responses from the API, including errors and edge cases.

Comment on lines +753 to +780
"from pydantic import BeforeValidator\n",
"\n",
"class QuestionAnswer(BaseModel):\n",
" question: str\n",
" answer: Annotated[\n",
" str,\n",
" BeforeValidator(\n",
" llm_validator(\"don't say objectionable things\")\n",
" ),\n",
" ]\n",
"\n",
"resp = client.chat.completions.create(\n",
" model=\"gpt-3.5-turbo\",\n",
" response_model=QuestionAnswer,\n",
" max_retries=2,\n",
" messages=[\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": \"You are a system that answers questions based on the context. answer exactly what the question asks using the context.\",\n",
" },\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": f\"using the context: `{context}`\\n\\nAnswer the following question: `{question}`\",\n",
" },\n",
" ],\n",
")\n",
"\n",
"resp.answer"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of BeforeValidator and llm_validator to ensure that responses do not contain objectionable content is a critical aspect of responsible AI deployment. It's important to test these validators thoroughly to ensure they are effective and do not over-filter content.

"resp = client.chat.completions.create(\n",
" model=\"gpt-3.5-turbo\",\n",
" messages=[\n",
" {\"role\": \"user\", \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesturday` into json today is {datetime.date.today()}\"},\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a typo in the word "yesturday" which should be corrected to "yesterday" to ensure the date is understood correctly.

- \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesturday` into json today is {datetime.date.today()}\"},\n",
+ \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesterday` into json today is {datetime.date.today()}\"},\n",

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
" {\"role\": \"user\", \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesturday` into json today is {datetime.date.today()}\"},\n",
" {\"role\": \"user\", \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesterday` into json today is {datetime.date.today()}\"},\n",

" \"content\": f\"\"\"\n",
" Today is {datetime.date.today()} \n",
"\n",
" Extract `Jason Liu is thirty years old his birthday is yesturday` \n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "yesturday" is misspelled and should be corrected to "yesterday" for proper date parsing.

- Extract `Jason Liu is thirty years old his birthday is yesturday` 
+ Extract `Jason Liu is thirty years old his birthday is yesterday`

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
" Extract `Jason Liu is thirty years old his birthday is yesturday` \n",
" Extract `Jason Liu is thirty years old his birthday is yesterday` \n",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants