Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Streaming MultiTask with response_model #221

Merged
merged 13 commits into from
Nov 26, 2023

Conversation

Anmol6
Copy link
Contributor

@Anmol6 Anmol6 commented Nov 25, 2023

Resolve #213

Summary by CodeRabbit

  • New Features
    • Added a search functionality to the app.

Copy link
Contributor

coderabbitai bot commented Nov 25, 2023

Walkthrough

The changes revolve around enhancing the streaming functionality to work with response_model in a multitask context. The updates allow for a more intuitive developer experience by enabling the direct iteration over streamed responses as instances of a specified model. This is achieved by introducing checks for iterable response models and handling them accordingly within the streaming process.

Changes

File Path Change Summary
examples/streaming_multitask/... Updated stream_extract to return Iterable[User] directly from client.chat.completions.create and removed cls parameter.
instructor/function_calls.py Added stream_multitask parameter to from_response method for conditional handling based on streaming.
instructor/patch.py Introduced checks for Iterable and MultiTaskBase in response handling, added stream parameter to functions.
tests/openai/test_multitask.py Added new test functions for different modes and updated stream_extract usage to include mode argument.
docs/concepts/... Renamed MultiUser to Users and updated usage in documentation.

Assessment against linked issues

Objective Addressed Explanation
Support Streaming MultiTask with response_model (#213) The changes include checks for iterable types and proper handling of MultiTask and MultiTaskBase classes, aligning with the objective to support streaming with response_model.

Poem

In the code where streams do flow, 🌊
The User models now align in a row.
With each update, like a hop, 🐇💻
We iterate with ease, non-stop.


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 3

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 359c5f9 and 25e8e32.
Files selected for processing (4)
  • examples/streaming_multitask/streaming_multitask.py (2 hunks)
  • instructor/function_calls.py (2 hunks)
  • instructor/patch.py (6 hunks)
  • tests/openai/test_multitask.py (2 hunks)
Additional comments: 12
examples/streaming_multitask/streaming_multitask.py (3)
  • 16-17:
    The User model now includes an age field. Ensure that all uses of the User model throughout the codebase are updated to handle this new field.

  • 21-26:
    The stream_extract function has been successfully refactored to return an Iterable[User] and no longer requires the cls parameter. This change simplifies the function's interface.

  • 45-50:
    The usage of stream_extract in the loop has been updated to match its new signature, and the loop correctly processes the iterable of User objects.

instructor/function_calls.py (3)
  • 190-194: The new parameter stream_multitask has been added to the from_response method. Ensure that all calls to this method have been updated to include this new parameter where necessary, or that default behavior is acceptable in existing calls.

  • 206-209: The conditional handling for stream_multitask is implemented correctly. However, ensure that the from_streaming_response method is defined and properly handles the streaming response as expected.

  • 211-213: The message variable is accessed with dot notation, which assumes that completion.choices[0].message is an object with attributes rather than a dictionary. This is inconsistent with the previous usage in the from_response method where completion is treated as a dictionary. Verify that completion is indeed an object with the expected structure or adjust the code to use dictionary accessors.

instructor/patch.py (5)
  • 3-6: The imports for MultiTask, MultiTaskBase, get_origin, get_args, and Iterable are correctly added to support the new streaming multitask functionality. These imports are necessary for the changes described in the summary.

  • 52-60: The handle_response_model function has been updated to handle the Iterable type in the response_model. The logic to create a MultiTask instance based on the iterable element class is correct and aligns with the summary. However, there is a potential issue with the type ignore comment on line 59. It's important to ensure that the openai_schema function can handle the response_model correctly and that the type ignore is justified.

  • 126-137: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [112-136]

The process_response function now includes a stream parameter and handles the response differently if stream is set to True and the response_model is a subclass of MultiTaskBase. This is consistent with the summary and allows for the correct processing of streaming multitask responses.

  • 150-163: The retry_async function has been updated to include the stream parameter and pass it to the process_response function. This ensures that the streaming behavior is consistently applied across retries. However, there is a potential issue with the error handling in the except block. The dump_message function is called with response.choices[0].message, but there is no check to ensure that response.choices and response.choices[0] are not empty, which could lead to an IndexError if the response does not contain any choices.

  • 187-200: Similarly, the retry_sync function has been updated to handle the stream parameter. The same potential issue with error handling in the except block applies here as well. It is necessary to verify that response.choices and response.choices[0] are not empty before attempting to access response.choices[0].message.

tests/openai/test_multitask.py (1)
  • 52-55: The stream_extract function in test_multi_user_legacy is correctly left unchanged as per the summary. However, the cls parameter is no longer used in the new stream_extract function, and the summary states that it was removed. If the intention was to remove it from the legacy function as well, this needs to be addressed. Otherwise, the legacy function is fine as is.

tests/openai/test_multitask.py Outdated Show resolved Hide resolved
tests/openai/test_multitask.py Outdated Show resolved Hide resolved
tests/openai/test_multitask.py Show resolved Hide resolved
Copy link
Collaborator

@jxnl jxnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is amazing. Do you mind adding a little bit to the documentation?

@Anmol6 Anmol6 changed the title Support Streaming MultiTask with response_model #213 Support Streaming MultiTask with response_model (Issue #213) Nov 25, 2023
Copy link
Collaborator

@jxnl jxnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also does this work for json mode or tools call too?

@Anmol6 Anmol6 changed the title Support Streaming MultiTask with response_model (Issue #213) Support Streaming MultiTask with response_model (resolves #213) Nov 25, 2023
@Anmol6
Copy link
Contributor Author

Anmol6 commented Nov 25, 2023

Also does this work for json mode or tools call too?

just merged from main, updating it right now!

completion = client.chat.completions.create(
Users = Iterable[User]

def stream_extract(input: str) -> Iterable[User]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def stream_extract(input: str) -> Iterable[User]:
def stream_extract(input: str) -> Users

this would work too right? or even response_model=Iterable[User]

love it

Copy link
Contributor Author

@Anmol6 Anmol6 Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup both work

@jxnl
Copy link
Collaborator

jxnl commented Nov 25, 2023

at least do something that just says only function allowed or etc.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 5

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 25e8e32 and 065c754.
Files selected for processing (5)
  • docs/concepts/patching.md (1 hunks)
  • instructor/dsl/multitask.py (2 hunks)
  • instructor/function_calls.py (2 hunks)
  • instructor/patch.py (7 hunks)
  • tests/openai/test_multitask.py (2 hunks)
Additional comments: 11
docs/concepts/patching.md (1)
  • 27-31:
    The change from Mode.TOOL_CALL to Mode.TOOLS in the client instantiation is consistent with the updates mentioned in the pull request summary and reflects the new operational mode for streaming multitask operations.
instructor/dsl/multitask.py (5)
  • 3-5:
    The import of Mode from instructor.function_calls is correct and necessary for the new mode parameter in MultiTaskBase methods.

  • 12-14:
    The from_streaming_response method correctly accepts the mode parameter and uses it to extract JSON chunks based on the mode.

  • 34-51:
    The extract_json method has been updated to handle different modes, but there is a potential issue with the assignment of json_chunk inside the if-else blocks. The := operator is used, which is known as the walrus operator and is only available in Python 3.8 and later. If the codebase is intended to support earlier versions of Python, this will cause a syntax error.

  • 31-31:
    The use of yield in tasks_from_chunks is correct and aligns with the streaming functionality described in the summary.

  • 47-49:
    The NotImplementedError is raised correctly for unsupported modes, ensuring that the function fails fast and clearly when encountering an unexpected mode.

instructor/function_calls.py (4)
  • 190-194: The from_response method has been updated to include a new parameter stream_multitask. This parameter is used to determine if the response should be processed as a streaming multitask operation. Ensure that all calls to this method are updated to handle the new parameter appropriately.

  • 206-209: The conditional logic to handle streaming multitask operations is correctly implemented. When stream_multitask is True, the method delegates to from_streaming_response with the completion and mode parameters. This aligns with the changes described in the summary.

  • 211-213: The message variable is accessed using dot notation (completion.choices[0].message), which assumes that completion is an object with attribute access. However, the previous version of from_response treated completion as a dictionary (completion["choices"][0]["message"]). This discrepancy could lead to an AttributeError if completion is not an object with the expected attributes. Verify that the completion parameter will always be an object that supports attribute access in the updated codebase.

  • 213-213: The assertion within the Mode.FUNCTIONS branch checks the function name against the schema. This is a good practice to ensure that the response corresponds to the expected function. However, the code assumes that message.function_call and message.function_call.name are accessible, which may not be the case if the response format is different. It's important to verify that the response format from OpenAI's API will always match this structure.

tests/openai/test_multitask.py (1)
  • 36-39: The same typo identified in the previous review is still present in the string concatenation. The word "entitites" should be corrected to "entities", and spaces should be added at the end of lines 37 and 38 to ensure proper spacing in the concatenated string.
-                        "Correctly segment it into entitites"
+                        "Correctly segment it into entities "

instructor/patch.py Outdated Show resolved Hide resolved
instructor/patch.py Outdated Show resolved Hide resolved
instructor/patch.py Outdated Show resolved Hide resolved
instructor/patch.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 065c754 and 4c76121.
Files selected for processing (1)
  • examples/streaming_multitask/streaming_multitask.py (2 hunks)
Additional comments: 3
examples/streaming_multitask/streaming_multitask.py (3)
  • 16-17:
    The addition of the age field to the User model is consistent with the summary and should be reflected in the documentation and any serialization/deserialization logic that involves the User model.

  • 21-26:
    The refactoring of the stream_extract function to return Users and the removal of the cls parameter is consistent with the summary. Ensure that all calls to stream_extract are updated to reflect these changes.

  • 45-50:
    The call to stream_extract in the example usage correctly reflects the updated function signature without the cls parameter. This is consistent with the changes described in the summary.

@Anmol6 Anmol6 requested a review from jxnl November 25, 2023 23:48
@Anmol6 Anmol6 changed the title Support Streaming MultiTask with response_model (resolves #213) Support Streaming MultiTask with response_model Nov 26, 2023
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 4c76121 and 1a27b0f.
Files selected for processing (3)
  • instructor/function_calls.py (2 hunks)
  • instructor/patch.py (7 hunks)
  • tests/openai/test_multitask.py (2 hunks)
Additional comments: 5
instructor/patch.py (4)
  • 99-100: The previous comment chain indicates that the else block at lines 99-100 is unnecessary and should be removed. This change has not been reflected in the provided hunk. If the codebase has not changed in a way that would require this else block, it should be removed as previously suggested.
99        else:
-           raise ValueError(f"Invalid patch mode: {mode}")
  • 107-134: The process_response function has been updated to handle streaming responses. The stream parameter is now used to determine if the response should be processed as a stream and whether the _raw_response attribute should be set. This change aligns with the pull request's goal to enhance streaming and multitasking capabilities. Ensure that all calls to process_response have been updated to pass the stream parameter correctly.

  • 148-158: The retry_async function now accepts and passes the stream parameter to process_response. This change is consistent with the pull request's goal to support streaming responses. Ensure that all calls to retry_async have been updated to include the stream parameter where necessary.

  • 185-195: Similarly, the retry_sync function has been updated to handle the stream parameter, ensuring that synchronous retries are compatible with the new streaming functionality. Ensure that all calls to retry_sync have been updated to include the stream parameter where necessary.

tests/openai/test_multitask.py (1)
  • 34-36: The same issue from the previous review comments about missing spaces and a typo in the word "entities" is still present in the new code. This needs to be corrected to ensure proper string formatting and spelling.
-                        "Correctly segment it into entitites"
+                        "Correctly segment it into entities "

instructor/function_calls.py Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 1a27b0f and a0fc78b.
Files selected for processing (1)
  • docs/concepts/lists.md (2 hunks)

docs/concepts/lists.md Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between a0fc78b and 16b443c.
Files selected for processing (2)
  • docs/concepts/lists.md (2 hunks)
  • instructor/patch.py (6 hunks)
Additional comments: 4
docs/concepts/lists.md (2)
  • 65-68:
    The type hint for Users as Iterable[User] is appropriate for the streaming context described in the documentation.

  • 89-94:
    The streaming example provided is a simplified demonstration of the new streaming multitask functionality. It is assumed that the reader will understand the context and that a more detailed example will be provided in a more comprehensive documentation section or with additional code comments.

instructor/patch.py (2)
  • 145-155: The stream parameter is correctly passed to the process_response function within the retry_async function. This aligns with the new streaming functionality and ensures that the streaming behavior is consistent across asynchronous retries.

  • 182-192: Similarly, the stream parameter is correctly passed to the process_response function within the retry_sync function. This ensures that the streaming behavior is consistent across synchronous retries as well.

instructor/patch.py Show resolved Hide resolved
instructor/patch.py Show resolved Hide resolved
Copy link
Collaborator

@jxnl jxnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome! let me know whenever you'd down to merge.

@jxnl
Copy link
Collaborator

jxnl commented Nov 26, 2023

Looks like coverage fell but thats because the openai tests are not run, no need to mock that

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 16b443c and 7ae58fb.
Files selected for processing (1)
  • docs/concepts/lists.md (4 hunks)
Additional comments: 2
docs/concepts/lists.md (2)
  • 1-4: The title and introduction seem appropriate for the document's content.

  • 13-54: The use of Iterable for defining the Users class and its implementation in the client.chat.completions.create call is correct and follows the Python typing conventions.

Comment on lines 64 to 83

## Streaming Tasks

Since a `MultiTask(T)` is well contrained to `tasks: List[T]` we can make assuptions on how tokens are used and provide a helper method that allows you generate tasks as the the tokens are streamed in
We can also generate tasks as the tokens are streamed in by defining an `Iterable[T]` type.

Lets look at an example in action with the same class

```python hl_lines="6 26"
MultiUser = instructor.MultiTask(User)
from typing import Iterable

Users = Iterable[User]

completion = client.chat.completions.create(
users = client.chat.completions.create(
model="gpt-4",
temperature=0.1,
stream=True,
response_model=MultiUser,
response_model=Users,
messages=[
{
"role": "system",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [67-100]

The streaming example uses an undefined variable input, which may lead to confusion. It should be defined or the example should be clarified to indicate that input is a placeholder for actual data.

Comment on lines 95 to 100
max_tokens=1000,
)

for user in MultiUser.from_streaming_response(completion):
for user in users:
assert isinstance(user, User)
print(user)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The note about streaming being a prototype is important and should be highlighted or made more prominent to ensure users are aware of its experimental status.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 7ae58fb and eb30b7e.
Files ignored due to filter (1)
  • pyproject.toml
Files selected for processing (2)
  • instructor/function_calls.py (2 hunks)
  • tests/openai/test_multitask.py (2 hunks)
Files skipped from review due to trivial changes (1)
  • instructor/function_calls.py
Additional comments: 1
tests/openai/test_multitask.py (1)
  • 127-133: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [105-133]

Verify that the MultiUser.from_streaming_response method is compatible with the Mode.FUNCTIONS parameter as used in the test_multi_user_legacy function.

tests/openai/test_multitask.py Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between eb30b7e and 5bf716e.
Files selected for processing (2)
  • docs/concepts/lists.md (3 hunks)
  • instructor/patch.py (6 hunks)
Additional comments: 9
docs/concepts/lists.md (4)
  • 1-4: The introduction is clear and concise, providing a good lead-in to the topic of multi-task and streaming.

  • 67-67: Verify if the markdown renderer supports hl_lines for highlighting lines in code blocks, as this feature is not standard in all markdown renderers.

  • 79-79: The use of the undefined variable input in the example code is still present and should be addressed.

  • 91-97: The note about streaming being a prototype should be made more prominent to ensure users are aware of its experimental status.

instructor/patch.py (5)
  • 62-62: The error message in the NotImplementedError is misleading because the check is for whether the response_model is a subclass of MultiTaskBase, not whether it is iterable. The error message should be updated to accurately reflect the condition being checked.
- raise NotImplementedError("stream=True is not supported when using response_model parameter for non-iterables")
+ raise NotImplementedError("stream=True is not supported unless the response_model is a subclass of MultiTaskBase")
  • 97-98: The else block at lines 97-98 is redundant since all possible values of mode are already handled by the preceding if and elif statements. If a new mode is added in the future, it should be handled explicitly. The else block can be removed to avoid confusion and potential dead code.
97        else:
-           raise ValueError(f"Invalid patch mode: {mode}")
  • 123-133: The process_response function has been updated to handle multitask models and return tasks if stream=False. Ensure that the logic for handling the stream and multitask models is correct and that the raw response is returned when stream is False.

  • 147-157: The retry_async function has been updated to include a stream parameter and pass it to the process_response function. Ensure that the stream parameter is being handled correctly and passed to the process_response function as intended.

  • 184-194: The retry_sync function has been updated to include a stream parameter and pass it to the process_response function. Ensure that the stream parameter is being handled correctly and passed to the process_response function as intended.

@jxnl jxnl merged commit 66f2f63 into instructor-ai:main Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Streaming MultiTask with response_model
2 participants