Add tutorials #154

jxnl · 2023-11-07T23:03:05Z

Summary by CodeRabbit

New Features
- Introduced a new section on requesting JSON data from OpenAI.
- Added examples of using Pydantic for data validation and schema definition.
- Showcased the integration of OpenAI's ChatCompletion API for structured data extraction.
- Demonstrated the creation of nested complex schemas with Pydantic.
Documentation
- Updated README with instructions for using openai<1.0.0 and instructor libraries.
- Added usage examples for async clients and new UserExtract class.
- Enhanced project documentation with async client notes and response_model usage.
- Included a section on Pydantic validation and error handling in documentation.
Refactor
- Added apatch to the public interface in instructor library for async support.
- Modified wrap_chatcompletion function to accept an is_async parameter.
Tests
- Created new async test functions and integrated pytest.mark.asyncio.
- Added tests for UserExtract model and async OpenAI client interactions.
Chores
- Updated tutorials with practical examples of structured output for RAG models.
- Enhanced knowledge graph handling with new methods and classes in tutorials.
- Provided examples of accessing original responses in API calls in documentation.

coderabbitai · 2023-11-07T23:03:10Z

Walkthrough

The updates encompass the integration of Pydantic for structured data validation and schema definition, enhancements to the OpenAI SDK via the instructor library, and the introduction of async client capabilities. Tutorials have been enriched with practical examples of applying structured output to RAG models, and the documentation now includes guidance on using async clients and accessing original responses.

Changes

File Path	Change Summary
tutorials/1.introduction.ipynb	Added sections on Pydantic for data validation/schema definition and OpenAI's ChatCompletion API integration.
tutorials/2.applications-rag.ipynb	Introduced content on structured output application to RAG models and practical examples.
README.md	Updated usage instructions for `openai` and `instructor` libraries, including async client examples.
docs/index.md	Updated documentation with async client usage and Pydantic validation examples.
instructor/init.py	Added `apatch` to public interface.
instructor/patch.py	Added `is_async` parameter to `wrap_chatcompletion` and `apatch` function.
tests/test_patch.py	Added async test functions and assertions for `UserExtract` model.
tutorials/2.tips.ipynb	Included tips on schema engineering and classification methods using Enums and Literals.
tutorials/3.applications-rag.ipynb	Added content on improving RAG models and examples using `instructor` library.
tutorials/4.knowledge-graphs.ipynb	Enhanced knowledge graph handling with new classes and methods.
tutorials/5.validation.ipynb	Implemented validation using Pydantic and `instructor` library with new classes and validation scenarios.

🐇 In the code's weave, through autumn's bright leaves, 🍂
We hop to the patch, async in dispatch,
Structured and neat, our schemas complete,
With Pydantic's charm, we code without alarm. 🌟

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

If you reply to a review comment from CodeRabbit, the bot will automatically respond.
To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
Note: Review comments are made on code diffs or files, not on the PR overview.
Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between a8a4b70 and b4e2126.

Files selected for processing (1)

tutorials/1.introduction.ipynb (1 hunks)

Files not reviewed due to errors (1)

tutorials/1.introduction.ipynb (Error: diff too large)

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between b4e2126 and 144c792.

Files ignored due to filter (1)

pyproject.toml

Files selected for processing (1)

tutorials/2.applications-rag.ipynb (1 hunks)

Files not reviewed due to errors (1)

tutorials/2.applications-rag.ipynb (Error: diff too large)

Co-authored-by: Jason Liu <[email protected]>

coderabbitai

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 144c792 and 1ec9114.

Files ignored due to filter (1)

pyproject.toml

Files selected for processing (7)

README.md (1 hunks)
docs/index.md (8 hunks)
instructor/init.py (2 hunks)
instructor/patch.py (2 hunks)
tests/test_patch.py (1 hunks)
tutorials/1.introduction.ipynb (1 hunks)
tutorials/2.applications-rag.ipynb (1 hunks)

Files not reviewed due to errors (1)

tutorials/1.introduction.ipynb (Error: diff too large)

Files skipped from review due to trivial changes (2)

docs/index.md
tutorials/2.applications-rag.ipynb

Additional comments: 10

instructor/__init__.py (2)

1-4: The new import apatch from the .patch module is added. Ensure that the function or variable apatch is defined in the .patch module.

11-17: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [6-17]

The apatch is added to the __all__ list, which means it is now part of the public interface of the instructor module. Make sure that this is the intended behavior and that the usage of apatch is documented.

README.md (3)

45-45: The assertion is correct and ensures that the age is correctly extracted.

48-62: The instructions for using openai<1.0.0 are clear and provide the necessary steps for patching a global client.

65-88: The instructions for using async clients are clear and provide the necessary steps for patching an async client. The example provided is also correct.

instructor/patch.py (1)

212-214: The wrap_chatcompletion function is called with is_async=True only for client.chat.completions.create. Ensure that other async methods of the client are also wrapped with is_async=True if needed.

tests/test_patch.py (4)

1-5: Imports are correctly placed and organized according to PEP8 guidelines.

8-9: The OpenAI and AsyncOpenAI clients are correctly patched with the instructor module.

12-30: The test_async_runmodel function is correctly defined as an async function and marked with pytest.mark.asyncio. The UserExtract model is defined correctly and used as the response_model in the aclient.chat.completions.create call. The assertions are correctly checking the instance type of the model, the value of the name attribute, and the existence of the _raw_response attribute.

32-32: The test_runmodel function is not implemented. Ensure to add the implementation or remove the function if it's not needed.

coderabbitai · 2023-11-12T03:01:51Z

instructor/patch.py

@@ -134,9 +134,7 @@ def retry_sync(
                raise e


Ensure that the exception e is logged for debugging purposes before it's raised again.

- raise e + logging.exception(e) + raise e

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Make sure it correctly replaces the highlighted code, has no missing lines and indentaion issues.

Suggested change

raise e

logging.exception(e)

raise e

coderabbitai · 2023-11-12T03:01:51Z

instructor/patch.py

+def wrap_chatcompletion(func: Callable, is_async: bool = None) -> Callable:
    @wraps(func)
    async def new_chatcompletion_async(
        response_model=None,


The is_async parameter is not documented in the function docstring. Add a description for is_async in the function docstring.

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 1ec9114 and 9905497.

Files selected for processing (3)

tutorials/1.introduction.ipynb (1 hunks)
tutorials/2.tips.ipynb (1 hunks)
tutorials/3.applications-rag.ipynb (1 hunks)

Files not reviewed due to errors (3)

tutorials/1.introduction.ipynb (Error: diff too large)
tutorials/2.tips.ipynb (Error: diff too large)
tutorials/3.applications-rag.ipynb (Error: diff too large)

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 9905497 and 075d6b9.

Files selected for processing (1)

tutorials/4.knowledge-graphs.ipynb (1 hunks)

Files not summarized due to errors (1)

tutorials/4.knowledge-graphs.ipynb: Error: Message exceeds token limit

Files not reviewed due to errors (1)

tutorials/4.knowledge-graphs.ipynb (Error: diff too large)

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 075d6b9 and e7b5992.

Files selected for processing (1)

docs/index.md (1 hunks)

Files skipped from review due to trivial changes (1)

docs/index.md

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between e7b5992 and 37d82ae.

Files selected for processing (1)

tutorials/5.validation.ipynb (1 hunks)

Files not reviewed due to errors (1)

tutorials/5.validation.ipynb (Error: diff too large)

coderabbitai

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 37d82ae and ea6de65.

Files selected for processing (1)

tutorials/4.knowledge-graphs.ipynb (1 hunks)

Additional comments: 10

tutorials/4.knowledge-graphs.ipynb (10)

7-28: The introduction to knowledge graphs is clear and informative, providing a good foundation for the rest of the tutorial.

49-50: The explanation of the libraries used is concise and gives the reader a clear understanding of their purpose in the tutorial.

67-70: The import statements and the client initialization look correct, but it's important to ensure that the instructor library's patch function is compatible with the OpenAI class from the openai library.

97-109: The Node and Edge classes are well-defined using Pydantic, which will help with data validation and error handling.

139-152: The KnowledgeGraph class and its visualize_knowledge_graph method are well implemented. The use of the graphviz library for visualization is appropriate.

174-184: The generate_graph function is designed to interact with an AI model to generate a knowledge graph. Ensure that the client.chat.completions.create method exists and that the response_model parameter correctly handles the conversion to a KnowledgeGraph object.

370-385: The addition of the __hash__ method to the Node and Edge classes is a good practice for handling duplicates, especially when using these objects in sets.

395-415: The KnowledgeGraph class has been updated to make the nodes and edges fields optional, which adds flexibility. The update method is a good addition for merging graphs. However, ensure that the deduplication logic in the update method works as intended, as combining lists and converting to a set may not work if the objects are not hashable or if their equality is not defined correctly.

514-521: The example use case for generating a knowledge graph from text chunks is a practical demonstration of the iterative graph generation process.

535-590: The conclusion provides a good summary of the tutorial's content and suggests further exercises for the reader, which is a great way to encourage practice and deeper understanding.

coderabbitai · 2023-11-17T19:51:44Z

tutorials/4.knowledge-graphs.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install instructor graphviz --quiet"


The installation command for the required libraries is correct, but it's always a good practice to specify the version of the libraries to ensure compatibility.

coderabbitai · 2023-11-17T19:51:45Z

tutorials/4.knowledge-graphs.ipynb

+    "def generate_graph(input: List[str]) -> KnowledgeGraph:\n",
+    "    # Initialize an empty KnowledgeGraph\n",
+    "    cur_state = KnowledgeGraph()\n",
+    "\n",
+    "    # Iterate over the input list\n",
+    "    for i, inp in enumerate(input):\n",
+    "        new_updates = client.chat.completions.create(\n",
+    "            model=\"gpt-3.5-turbo-16k\",\n",
+    "            messages=[\n",
+    "                {\n",
+    "                    \"role\": \"system\",\n",
+    "                    \"content\": \"\"\"You are an iterative knowledge graph builder.\n",
+    "                    You are given the current state of the graph, and you must append the nodes and edges \n",
+    "                    to it Do not procide any duplcates and try to reuse nodes as much as possible.\"\"\",\n",
+    "                },\n",
+    "                {\n",
+    "                    \"role\": \"user\",\n",
+    "                    \"content\": f\"\"\"Extract any new nodes and edges from the following:\n",
+    "                    # Part {i}/{len(input)} of the input:\n",
+    "\n",
+    "                    {inp}\"\"\",\n",
+    "                },\n",
+    "                {\n",
+    "                    \"role\": \"user\",\n",
+    "                    \"content\": f\"\"\"Here is the current state of the graph:\n",
+    "                    {cur_state.model_dump_json(indent=2)}\"\"\",\n",
+    "                },\n",
+    "            ],\n",
+    "            response_model=KnowledgeGraph,\n",
+    "        )  # type: ignore\n",
+    "\n",
+    "        # Update the current state with the new updates\n",
+    "        cur_state = cur_state.update(new_updates)\n",
+    "\n",
+    "        # Draw the current state of the graph\n",
+    "        cur_state.visualize_knowledge_graph() \n",
+    "        \n",
+    "    # Return the final state of the KnowledgeGraph\n",
+    "    return cur_state\n"


The iterative generate_graph function is a significant improvement for handling larger datasets. The use of the update method to merge the graphs is a good approach. However, ensure that the client.chat.completions.create method can handle the iterative prompts and that the response_model parameter is used correctly. Also, the comment in the system role message has a typo ("procide" should be "provide").

- to it Do not procide any duplcates and try to reuse nodes as much as possible.\"\"\",\n", + to it. Do not provide any duplicates and try to reuse nodes as much as possible.\"\"\",\n",

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

"def generate_graph(input: List[str]) -> KnowledgeGraph:\n",

" # Initialize an empty KnowledgeGraph\n",

" cur_state = KnowledgeGraph()\n",

"\n",

" # Iterate over the input list\n",

" for i, inp in enumerate(input):\n",

" new_updates = client.chat.completions.create(\n",

" model=\"gpt-3.5-turbo-16k\",\n",

" messages=[\n",

" {\n",

" \"role\": \"system\",\n",

" \"content\": \"\"\"You are an iterative knowledge graph builder.\n",

" You are given the current state of the graph, and you must append the nodes and edges \n",

" to it Do not procide any duplcates and try to reuse nodes as much as possible.\"\"\",\n",

" },\n",

" {\n",

" \"role\": \"user\",\n",

" \"content\": f\"\"\"Extract any new nodes and edges from the following:\n",

" # Part {i}/{len(input)} of the input:\n",

"\n",

" {inp}\"\"\",\n",

" },\n",

" {\n",

" \"role\": \"user\",\n",

" \"content\": f\"\"\"Here is the current state of the graph:\n",

" {cur_state.model_dump_json(indent=2)}\"\"\",\n",

" },\n",

" ],\n",

" response_model=KnowledgeGraph,\n",

" ) # type: ignore\n",

"\n",

" # Update the current state with the new updates\n",

" cur_state = cur_state.update(new_updates)\n",

"\n",

" # Draw the current state of the graph\n",

" cur_state.visualize_knowledge_graph() \n",

" \n",

" # Return the final state of the KnowledgeGraph\n",

" return cur_state\n"

"def generate_graph(input: List[str]) -> KnowledgeGraph:\n",

" # Initialize an empty KnowledgeGraph\n",

" cur_state = KnowledgeGraph()\n",

"\n",

" # Iterate over the input list\n",

" for i, inp in enumerate(input):\n",

" new_updates = client.chat.completions.create(\n",

" model=\"gpt-3.5-turbo-16k\",\n",

" messages=[\n",

" {\n",

" \"role\": \"system\",\n",

" \"content\": \"\"\"You are an iterative knowledge graph builder.\n",

" You are given the current state of the graph, and you must append the nodes and edges \n",

" to it. Do not provide any duplicates and try to reuse nodes as much as possible.\"\"\",\n",

" },\n",

" {\n",

" \"role\": \"user\",\n",

" \"content\": f\"\"\"Extract any new nodes and edges from the following:\n",

" # Part {i}/{len(input)} of the input:\n",

"\n",

" {inp}\"\"\",\n",

" },\n",

" {\n",

" \"role\": \"user\",\n",

" \"content\": f\"\"\"Here is the current state of the graph:\n",

" {cur_state.model_dump_json(indent=2)}\"\"\",\n",

" },\n",

" ],\n",

" response_model=KnowledgeGraph,\n",

" ) # type: ignore\n",

"\n",

" # Update the current state with the new updates\n",

" cur_state = cur_state.update(new_updates)\n",

"\n",

" # Draw the current state of the graph\n",

" cur_state.visualize_knowledge_graph() \n",

" \n",

" # Return the final state of the KnowledgeGraph\n",

" return cur_state\n"

coderabbitai

Review Status

Actionable comments generated: 6

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between ea6de65 and 9774df5.

Files selected for processing (1)

tutorials/1.introduction.ipynb (1 hunks)

Additional comments: 12

tutorials/1.introduction.ipynb (12)

1-14: The introduction is clear and sets the stage for the problem that the notebook aims to solve. It's good practice to link to additional resources for users who want more background information.

20-22: The explanation of the problem with working with raw JSON is clear and sets up the rationale for using a library like Pydantic.

26-34: This code block demonstrates the potential inconsistencies in JSON data. It's a good example of the kind of data issues Pydantic can help address.

111-119: This code block effectively demonstrates how to define a Pydantic model and instantiate it with valid data.

186-187: This code block demonstrates Pydantic's validation error messages, which is useful for understanding how Pydantic handles invalid data.

201-232: This section introduces integration with the OpenAI API, which is a practical application of Pydantic models for handling structured data from an external service.

298-313: Here, the notebook demonstrates extending a Pydantic model to include additional fields and shows how to handle more complex data structures. This is a good demonstration of Pydantic's extensibility.

320-327: The explanation of function calling and its use in the context of the OpenAI API is informative and relevant to the notebook's topic.

381-402: This code block demonstrates generating a JSON schema from a Pydantic model, which is a powerful feature for documentation and validation purposes.

443-454: The example of defining nested schemas with Pydantic is excellent, showing the library's capability to handle complex data structures with ease.

461-462: The conclusion ties back to the library instructor and its purpose, providing context for the examples given in the notebook.

482-499: This final code block shows how to integrate the instructor library with the OpenAI SDK, demonstrating a practical application of the concepts discussed in the notebook.

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 9774df5 and 90d8a8c.

Files selected for processing (1)

docs/index.md (1 hunks)

Files skipped from review due to trivial changes (1)

docs/index.md

coderabbitai

Review Status

Actionable comments generated: 18

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 90d8a8c and d02111d.

Files selected for processing (5)

tutorials/1.introduction.ipynb (1 hunks)
tutorials/2.tips.ipynb (1 hunks)
tutorials/3.applications-rag.ipynb (1 hunks)
tutorials/4.knowledge-graphs.ipynb (1 hunks)
tutorials/5.validation.ipynb (1 hunks)

Files not summarized due to errors (3)

tutorials/1.introduction.ipynb: Error: Message exceeds token limit
tutorials/4.knowledge-graphs.ipynb: Error: Message exceeds token limit
tutorials/5.validation.ipynb: Error: Message exceeds token limit

Additional comments: 31

tutorials/2.tips.ipynb (6)

34-83: The code demonstrates the use of Enums in Pydantic models and how to use them with OpenAI's API. The House enum is correctly defined without using auto() to ensure the values are meaningful strings rather than integers. The Character model is then used in a request to the OpenAI API, and the response is dumped using resp.model_dump(). This is a good practice for ensuring that the response adheres to the expected schema.

86-118: This cell shows an alternative approach using Literals instead of Enums. This is appropriate for cases where the set of values is small and fixed. The code is correct and follows best practices for type hinting in Pydantic.

132-177: The code cell introduces a way to handle arbitrary properties by defining a Property model and including a list of these in the Character model. This is a flexible approach that can handle various data without needing to know the exact schema beforehand. The use of a list of Property objects within the Character model is a good example of nested models in Pydantic.

258-315: The code cell shows how to define multiple entities within a single model. This is a common pattern when dealing with collections of items in APIs and is correctly implemented here.
[APROVED]

329-368: This cell demonstrates defining relationships between entities using lists of references (in this case, user IDs to represent friends). This is a common pattern in data modeling and is well implemented here.

372-511: The final code cell uses Graphviz to visualize the relationships between entities. This is a practical example of how to use Python libraries to create visual representations of data structures. The code correctly checks to avoid duplicating edges by ensuring that the friend ID is greater than the user ID before adding an edge.

tutorials/3.applications-rag.ipynb (16)

7-26: The introduction to RAG models is clear and informative. It provides a good foundation for readers who are new to the concept. The use of an image to illustrate the process is also helpful.

33-55: The section on the limitations of simple RAG models is well-structured and provides concrete examples of the challenges faced when using such models. This sets the stage for the subsequent sections on improving RAG models.

62-72: The explanation of query understanding as a solution to improve RAG models is concise and to the point. The accompanying image likely adds value by visually representing the concept.

86-101: The introduction to the instructor library and its integration with the OpenAI API is clear. However, ensure that the OpenAI class and its methods are up to date with the current OpenAI API.

128-131: The Extraction model is well-defined using Pydantic, which will help in validating the structured data. The use of Field with descriptions is a good practice for documentation and clarity.

168-203: The code example demonstrates how to use the instructor library to create structured outputs from a text chunk. The output is clearly printed, showing the structured data. Ensure that the model="gpt-4-1106-preview" is a valid model identifier and that the response_model parameter is correctly implemented in the instructor library.

210-210: The explanation of how embedding summaries, hypothetical questions, and keywords can improve the search results is insightful and demonstrates a practical application of structured output.

217-236: The introduction of temporal context to queries is a good example of how structured output can be used to enhance query understanding. The DateRange and Query models are well-defined.

243-252: The explanation of how the structured query can be used to optimize backend search results is clear and provides a good use case for the models defined earlier.

275-290: The code example for adding temporal context to a query is well-structured. However, ensure that the model="gpt-4-1106-preview" is still a valid model identifier and that the response_model parameter is correctly implemented in the instructor library.

353-359: The explanation of how personal assistants can benefit from structured output to handle parallel processing and fetch information from multiple backends is insightful and sets the stage for a practical example.
[APROVED]

368-379: The SearchClient and Retrival models are well-defined using Pydantic. This structured approach will help in validating the data and ensuring that the queries are well-formed.

424-432: The code example for dispatching queries to different backends using structured output is clear. Ensure that the model="gpt-4-1106-preview" is still a valid model identifier and that the response_model parameter is correctly implemented in the instructor library.

504-517: The section on decomposing questions into sub-questions is a complex but valuable example of using structured output to enhance query understanding. It shows how to break down a problem into smaller, manageable parts.

568-588: The Question and QueryPlan models are well-defined, and the code example demonstrates how to use structured output to decompose a complex query into sub-questions. Ensure that the model="gpt-4-1106-preview" is still a valid model identifier and that the response_model parameter is correctly implemented in the instructor library.

595-597: The closing remarks summarize the section well and highlight the potential of structured outputs in leveraging language models for system components.

tutorials/4.knowledge-graphs.ipynb (4)

58-61: The instructor library is being used to patch the OpenAI client. This is a good use of a wrapper to extend functionality, but it's important to ensure that the instructor library is actively maintained and compatible with the version of the OpenAI client being used. If the instructor library modifies the behavior of the OpenAI client, it could potentially introduce unexpected side effects or bugs.

127-143: The visualize_knowledge_graph method uses the graphviz library to visualize the knowledge graph. This is a good approach for generating visual representations of graphs. However, ensure that the graphviz library is installed in the environment where this notebook will be run, as it is an external dependency.

896-903: The example use case provided at the end of the notebook demonstrates how to use the generate_graph function with a list of text chunks. This is a good demonstration of the iterative graph generation process. However, ensure that the generate_graph function is fully tested and handles various input cases correctly, especially with regard to error handling and API response parsing mentioned earlier.

917-972: The conclusion provides a summary of what was covered in the tutorial and suggests exercises for the reader. This is a good educational practice as it encourages the reader to apply what they've learned. However, ensure that the examples provided in the exercises are feasible with the current implementation of the KnowledgeGraph class and related functions. If additional functionality is required to complete these exercises, consider providing that in the tutorial or as supplementary material.

tutorials/5.validation.ipynb (5)

16-20: The explanation of Pydantic's role in validation and the instructor library's extension of its capabilities is clear and informative. It's important to ensure that the links to external documentation are kept up-to-date to maintain the usefulness of this tutorial.

65-68: The example validator function name_must_contain_space is a good demonstration of custom validation logic. It's simple and demonstrates the concept effectively.

367-368: The use of Pydantic's Field constraints to limit the length of a message is a good example of using built-in validators. This is a simple and effective way to enforce message length constraints.

417-431: The use of context in the AnswerWithCitation model is a sophisticated example of validation that ensures the citation is present in the provided text chunk. This is a good practice for ensuring the accuracy of referenced information.

796-805: The conclusion provides a good summary of the tutorial's content and offers a to-do list for further exploration. This is a great way to encourage continued learning and application of the concepts covered in the tutorial.

Overall, the tutorial content is well-structured and covers important aspects of data validation and integration with OpenAI's API. The use of Pydantic and the instructor library is well-explained, and the examples are practical and relevant. It's important to ensure that the code is robust, handles edge cases, and performs well under different conditions. Additionally, the tutorial should encourage best practices such as error handling, caching, and performance optimization.

coderabbitai · 2023-11-19T01:08:11Z

tutorials/1.introduction.ipynb

+    "We have a `name` field, which is a string, and an `age` field, which is an integer. However, if we were to load this into a dictionary, we would have no way of knowing if the data is valid. For example, we could have a string for the age, or we could have a float for the age. We could also have a string for the name, or we could have a list for the name."
+   ]


The explanation here is good, but it could be enhanced by explicitly stating that dictionaries in Python do not enforce types or schemas, which is why Pydantic is useful.

coderabbitai · 2023-11-19T01:08:12Z

tutorials/1.introduction.ipynb

+   "source": [
+    "for obj in data:\n",
+    "    name = obj.get(\"first_name\")\n",
+    "    age = obj.get(\"age\")\n",
+    "    print(f\"{name} is {age}\")\n",
+    "    print(f\"Next year he will be {age+1} years old\")"


This code block is meant to demonstrate the failure that occurs when types are not enforced. However, the error message in the traceback (lines 59-66) indicates a type error due to string concatenation with an integer, which is a different issue than what's being discussed. It would be more illustrative to show an error that directly results from incorrect types associated with the JSON keys.

coderabbitai · 2023-11-19T01:08:12Z

tutorials/1.introduction.ipynb

+    "## Pydantic to the rescue\n",
+    "\n",
+    "Pydantic is a library that allows us to define data structures, and then validate them. It also allows us to define data structures."


The introduction to Pydantic is succinct and to the point. It might be beneficial to mention that Pydantic can also be used for parsing and exporting data, which is a significant part of its functionality.

coderabbitai · 2023-11-19T01:08:12Z

tutorials/1.introduction.ipynb

+    "person = Person.model_validate({\"name\": \"Sam\", \"age\": \"30\"})\n",
+    "person"


The use of Person.model_validate is not standard Pydantic usage. Typically, you would instantiate a model directly with Person(**data). If model_validate is a custom method, it should be clearly documented, or standard Pydantic practices should be used for clarity.

coderabbitai · 2023-11-19T01:08:12Z

tutorials/1.introduction.ipynb

+    "assert person.name == \"Sam\"\n",
+    "assert person.age == 20"


The assertion here is incorrect as it checks person.age == 20 while the instantiated person has an age of 30. This will raise an AssertionError.

- assert person.age == 20 + assert person.age == 30

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

"assert person.name == \"Sam\"\n",

"assert person.age == 20"

"assert person.name == \"Sam\"\n",

"assert person.age == 30"

coderabbitai · 2023-11-19T01:08:13Z

tutorials/5.validation.ipynb

+    "def validate_chain_of_thought(values):\n",
+    "    chain_of_thought = values[\"chain_of_thought\"]\n",
+    "    answer = values[\"answer\"]\n",
+    "    resp = client.chat.completions.create(\n",
+    "        model=\"gpt-4-1106-preview\",\n",
+    "        messages=[\n",
+    "            {\n",
+    "                \"role\": \"system\",\n",
+    "                \"content\": \"You are a validator. Determine if the value follows from the statement. If it is not, explain why.\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": f\"Verify that `{answer}` follows the chain of thought: {chain_of_thought}\",\n",
+    "            },\n",
+    "        ],\n",
+    "        response_model=Validation,\n",
+    "    )\n",
+    "    if not resp.is_valid:\n",
+    "        raise ValueError(resp.error_message)\n",
+    "    return values"


The validate_chain_of_thought function demonstrates an advanced use case of integrating LLMs into the validation process. It's important to ensure that the LLM's responses are interpreted correctly and that the error messages are clear and actionable.

coderabbitai · 2023-11-19T01:08:13Z

tutorials/5.validation.ipynb

+    "class QuestionAnswer(BaseModel):\n",
+    "    question: str\n",
+    "    answer: str\n",
+    "\n",
+    "question = \"What is the meaning of life?\"\n",
+    "context = \"The according to the devil the meaning of life is a life of sin and debauchery.\"\n",
+    "\n",
+    "\n",
+    "resp = client.chat.completions.create(\n",
+    "    model=\"gpt-4-1106-preview\",\n",
+    "    response_model=QuestionAnswer,\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": \"You are a system that answers questions based on the context. answer exactly what the question asks using the context.\",\n",
+    "        },\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": f\"using the context: `{context}`\\n\\nAnswer the following question: `{question}`\",\n",
+    "        },\n",
+    "    ],\n",
+    ")\n",
+    "\n",
+    "resp.answer"


The example of using the response_model parameter with the OpenAI API is a good demonstration of how to integrate structured output with the API. It's important to ensure that the response_model is designed to handle all possible responses from the API, including errors and edge cases.

coderabbitai · 2023-11-19T01:08:13Z

tutorials/5.validation.ipynb

+    "from pydantic import BeforeValidator\n",
+    "\n",
+    "class QuestionAnswer(BaseModel):\n",
+    "    question: str\n",
+    "    answer: Annotated[\n",
+    "        str,\n",
+    "        BeforeValidator(\n",
+    "            llm_validator(\"don't say objectionable things\")\n",
+    "        ),\n",
+    "    ]\n",
+    "\n",
+    "resp = client.chat.completions.create(\n",
+    "    model=\"gpt-3.5-turbo\",\n",
+    "    response_model=QuestionAnswer,\n",
+    "    max_retries=2,\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": \"You are a system that answers questions based on the context. answer exactly what the question asks using the context.\",\n",
+    "        },\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": f\"using the context: `{context}`\\n\\nAnswer the following question: `{question}`\",\n",
+    "        },\n",
+    "    ],\n",
+    ")\n",
+    "\n",
+    "resp.answer"


The use of BeforeValidator and llm_validator to ensure that responses do not contain objectionable content is a critical aspect of responsible AI deployment. It's important to test these validators thoroughly to ensure they are effective and do not over-filter content.

coderabbitai · 2023-11-19T01:08:14Z

tutorials/1.introduction.ipynb

+    "resp = client.chat.completions.create(\n",
+    "    model=\"gpt-3.5-turbo\",\n",
+    "    messages=[\n",
+    "        {\"role\": \"user\", \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesturday` into json today is {datetime.date.today()}\"},\n",


There is a typo in the word "yesturday" which should be corrected to "yesterday" to ensure the date is understood correctly.

- \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesturday` into json today is {datetime.date.today()}\"},\n", + \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesterday` into json today is {datetime.date.today()}\"},\n",

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

" {\"role\": \"user\", \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesturday` into json today is {datetime.date.today()}\"},\n",

" {\"role\": \"user\", \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesterday` into json today is {datetime.date.today()}\"},\n",

coderabbitai · 2023-11-19T01:08:14Z

tutorials/1.introduction.ipynb

+    "            \"content\": f\"\"\"\n",
+    "            Today is {datetime.date.today()} \n",
+    "\n",
+    "            Extract `Jason Liu is thirty years old his birthday is yesturday` \n",


The word "yesturday" is misspelled and should be corrected to "yesterday" for proper date parsing.

- Extract `Jason Liu is thirty years old his birthday is yesturday` + Extract `Jason Liu is thirty years old his birthday is yesterday`

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

" Extract `Jason Liu is thirty years old his birthday is yesturday` \n",

" Extract `Jason Liu is thirty years old his birthday is yesterday` \n",

add tutorials

b4e2126

coderabbitai bot reviewed Nov 7, 2023

View reviewed changes

add docs

ea91e99

jxnl added the documentation Improvements or additions to documentation label Nov 8, 2023

jxnl changed the title ~~Add tutorials~~ Add tutorials Nov 8, 2023

add rag

144c792

coderabbitai bot reviewed Nov 11, 2023

View reviewed changes

jxnl and others added 2 commits November 10, 2023 20:48

Merge branch 'main' into tutorials

df20cb4

small fixes to tutorials (#168)

1ec9114

Co-authored-by: Jason Liu <[email protected]>

coderabbitai bot reviewed Nov 12, 2023

View reviewed changes

update tutorials

9905497

coderabbitai bot reviewed Nov 12, 2023

View reviewed changes

jxnl and others added 2 commits November 11, 2023 22:59

Merge branch 'main' into tutorials

d1372cf

add graph

075d6b9

coderabbitai bot reviewed Nov 12, 2023

View reviewed changes

update docs

e7b5992

coderabbitai bot reviewed Nov 13, 2023

View reviewed changes

first version validation tutorial (#180)

37d82ae

coderabbitai bot reviewed Nov 16, 2023

View reviewed changes

clean up kg

ea6de65

coderabbitai bot reviewed Nov 17, 2023

View reviewed changes

Tutorials creative acts in documentation (#191)

9774df5

coderabbitai bot reviewed Nov 18, 2023

View reviewed changes

Merge branch 'main' into tutorials

90d8a8c

coderabbitai bot reviewed Nov 18, 2023

View reviewed changes

update tutorials

d02111d

jxnl merged commit 9269f23 into main Nov 19, 2023

jxnl deleted the tutorials branch November 19, 2023 01:07

coderabbitai bot reviewed Nov 19, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tutorials #154

Add tutorials #154

jxnl commented Nov 7, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 7, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Nov 12, 2023

coderabbitai bot Nov 12, 2023

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Nov 17, 2023

coderabbitai bot Nov 17, 2023

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Nov 19, 2023

coderabbitai bot Nov 19, 2023

coderabbitai bot Nov 19, 2023

coderabbitai bot Nov 19, 2023

coderabbitai bot Nov 19, 2023

coderabbitai bot Nov 19, 2023

coderabbitai bot Nov 19, 2023

coderabbitai bot Nov 19, 2023

coderabbitai bot Nov 19, 2023

coderabbitai bot Nov 19, 2023

		"We have a `name` field, which is a string, and an `age` field, which is an integer. However, if we were to load this into a dictionary, we would have no way of knowing if the data is valid. For example, we could have a string for the age, or we could have a float for the age. We could also have a string for the name, or we could have a list for the name."
		]

		"person = Person.model_validate({\"name\": \"Sam\", \"age\": \"30\"})\n",
		"person"

	" {\"role\": \"user\", \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesturday` into json today is {datetime.date.today()}\"},\n",
	" {\"role\": \"user\", \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesterday` into json today is {datetime.date.today()}\"},\n",

	" Extract `Jason Liu is thirty years old his birthday is yesturday` \n",
	" Extract `Jason Liu is thirty years old his birthday is yesterday` \n",

Add tutorials #154

Add tutorials #154

Conversation

jxnl commented Nov 7, 2023 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Nov 7, 2023 • edited Loading

Walkthrough

Changes

Chat with CodeRabbit Bot (@coderabbitai)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 12, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 12, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 17, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 17, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2023

Choose a reason for hiding this comment

jxnl commented Nov 7, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 7, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Configration File (`.coderabbit.yaml`)