-
-
Notifications
You must be signed in to change notification settings - Fork 730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Streaming MultiTask with response_model #221
Conversation
WalkthroughThe changes revolve around enhancing the streaming functionality to work with Changes
Assessment against linked issues
Poem
TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 3
Configuration used: CodeRabbit UI
Files selected for processing (4)
- examples/streaming_multitask/streaming_multitask.py (2 hunks)
- instructor/function_calls.py (2 hunks)
- instructor/patch.py (6 hunks)
- tests/openai/test_multitask.py (2 hunks)
Additional comments: 12
examples/streaming_multitask/streaming_multitask.py (3)
16-17:
TheUser
model now includes anage
field. Ensure that all uses of theUser
model throughout the codebase are updated to handle this new field.21-26:
Thestream_extract
function has been successfully refactored to return anIterable[User]
and no longer requires thecls
parameter. This change simplifies the function's interface.45-50:
The usage ofstream_extract
in the loop has been updated to match its new signature, and the loop correctly processes the iterable ofUser
objects.instructor/function_calls.py (3)
190-194: The new parameter
stream_multitask
has been added to thefrom_response
method. Ensure that all calls to this method have been updated to include this new parameter where necessary, or that default behavior is acceptable in existing calls.206-209: The conditional handling for
stream_multitask
is implemented correctly. However, ensure that thefrom_streaming_response
method is defined and properly handles the streaming response as expected.211-213: The
message
variable is accessed with dot notation, which assumes thatcompletion.choices[0].message
is an object with attributes rather than a dictionary. This is inconsistent with the previous usage in thefrom_response
method wherecompletion
is treated as a dictionary. Verify thatcompletion
is indeed an object with the expected structure or adjust the code to use dictionary accessors.instructor/patch.py (5)
3-6: The imports for
MultiTask
,MultiTaskBase
,get_origin
,get_args
, andIterable
are correctly added to support the new streaming multitask functionality. These imports are necessary for the changes described in the summary.52-60: The
handle_response_model
function has been updated to handle theIterable
type in theresponse_model
. The logic to create aMultiTask
instance based on the iterable element class is correct and aligns with the summary. However, there is a potential issue with the type ignore comment on line 59. It's important to ensure that theopenai_schema
function can handle theresponse_model
correctly and that the type ignore is justified.126-137: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [112-136]
The
process_response
function now includes astream
parameter and handles the response differently ifstream
is set toTrue
and theresponse_model
is a subclass ofMultiTaskBase
. This is consistent with the summary and allows for the correct processing of streaming multitask responses.
150-163: The
retry_async
function has been updated to include thestream
parameter and pass it to theprocess_response
function. This ensures that the streaming behavior is consistently applied across retries. However, there is a potential issue with the error handling in the except block. Thedump_message
function is called withresponse.choices[0].message
, but there is no check to ensure thatresponse.choices
andresponse.choices[0]
are not empty, which could lead to anIndexError
if the response does not contain any choices.187-200: Similarly, the
retry_sync
function has been updated to handle thestream
parameter. The same potential issue with error handling in the except block applies here as well. It is necessary to verify thatresponse.choices
andresponse.choices[0]
are not empty before attempting to accessresponse.choices[0].message
.tests/openai/test_multitask.py (1)
- 52-55: The
stream_extract
function intest_multi_user_legacy
is correctly left unchanged as per the summary. However, thecls
parameter is no longer used in the newstream_extract
function, and the summary states that it was removed. If the intention was to remove it from the legacy function as well, this needs to be addressed. Otherwise, the legacy function is fine as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is amazing. Do you mind adding a little bit to the documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also does this work for json mode or tools call too?
just merged from main, updating it right now! |
completion = client.chat.completions.create( | ||
Users = Iterable[User] | ||
|
||
def stream_extract(input: str) -> Iterable[User]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def stream_extract(input: str) -> Iterable[User]: | |
def stream_extract(input: str) -> Users |
this would work too right? or even response_model=Iterable[User]
love it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup both work
at least do something that just says only function allowed or etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 5
Configuration used: CodeRabbit UI
Files selected for processing (5)
- docs/concepts/patching.md (1 hunks)
- instructor/dsl/multitask.py (2 hunks)
- instructor/function_calls.py (2 hunks)
- instructor/patch.py (7 hunks)
- tests/openai/test_multitask.py (2 hunks)
Additional comments: 11
docs/concepts/patching.md (1)
- 27-31:
The change fromMode.TOOL_CALL
toMode.TOOLS
in theclient
instantiation is consistent with the updates mentioned in the pull request summary and reflects the new operational mode for streaming multitask operations.instructor/dsl/multitask.py (5)
3-5:
The import ofMode
frominstructor.function_calls
is correct and necessary for the newmode
parameter inMultiTaskBase
methods.12-14:
Thefrom_streaming_response
method correctly accepts themode
parameter and uses it to extract JSON chunks based on the mode.34-51:
Theextract_json
method has been updated to handle different modes, but there is a potential issue with the assignment ofjson_chunk
inside the if-else blocks. The:=
operator is used, which is known as the walrus operator and is only available in Python 3.8 and later. If the codebase is intended to support earlier versions of Python, this will cause a syntax error.31-31:
The use ofyield
intasks_from_chunks
is correct and aligns with the streaming functionality described in the summary.47-49:
TheNotImplementedError
is raised correctly for unsupported modes, ensuring that the function fails fast and clearly when encountering an unexpected mode.instructor/function_calls.py (4)
190-194: The
from_response
method has been updated to include a new parameterstream_multitask
. This parameter is used to determine if the response should be processed as a streaming multitask operation. Ensure that all calls to this method are updated to handle the new parameter appropriately.206-209: The conditional logic to handle streaming multitask operations is correctly implemented. When
stream_multitask
isTrue
, the method delegates tofrom_streaming_response
with thecompletion
andmode
parameters. This aligns with the changes described in the summary.211-213: The
message
variable is accessed using dot notation (completion.choices[0].message
), which assumes thatcompletion
is an object with attribute access. However, the previous version offrom_response
treatedcompletion
as a dictionary (completion["choices"][0]["message"]
). This discrepancy could lead to anAttributeError
ifcompletion
is not an object with the expected attributes. Verify that thecompletion
parameter will always be an object that supports attribute access in the updated codebase.213-213: The assertion within the
Mode.FUNCTIONS
branch checks the function name against the schema. This is a good practice to ensure that the response corresponds to the expected function. However, the code assumes thatmessage.function_call
andmessage.function_call.name
are accessible, which may not be the case if the response format is different. It's important to verify that the response format from OpenAI's API will always match this structure.tests/openai/test_multitask.py (1)
- 36-39: The same typo identified in the previous review is still present in the string concatenation. The word "entitites" should be corrected to "entities", and spaces should be added at the end of lines 37 and 38 to ensure proper spacing in the concatenated string.
- "Correctly segment it into entitites" + "Correctly segment it into entities "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- examples/streaming_multitask/streaming_multitask.py (2 hunks)
Additional comments: 3
examples/streaming_multitask/streaming_multitask.py (3)
16-17:
The addition of theage
field to theUser
model is consistent with the summary and should be reflected in the documentation and any serialization/deserialization logic that involves theUser
model.21-26:
The refactoring of thestream_extract
function to returnUsers
and the removal of thecls
parameter is consistent with the summary. Ensure that all calls tostream_extract
are updated to reflect these changes.45-50:
The call tostream_extract
in the example usage correctly reflects the updated function signature without thecls
parameter. This is consistent with the changes described in the summary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (3)
- instructor/function_calls.py (2 hunks)
- instructor/patch.py (7 hunks)
- tests/openai/test_multitask.py (2 hunks)
Additional comments: 5
instructor/patch.py (4)
- 99-100: The previous comment chain indicates that the
else
block at lines 99-100 is unnecessary and should be removed. This change has not been reflected in the provided hunk. If the codebase has not changed in a way that would require thiselse
block, it should be removed as previously suggested.99 else: - raise ValueError(f"Invalid patch mode: {mode}")
107-134: The
process_response
function has been updated to handle streaming responses. Thestream
parameter is now used to determine if the response should be processed as a stream and whether the_raw_response
attribute should be set. This change aligns with the pull request's goal to enhance streaming and multitasking capabilities. Ensure that all calls toprocess_response
have been updated to pass thestream
parameter correctly.148-158: The
retry_async
function now accepts and passes thestream
parameter toprocess_response
. This change is consistent with the pull request's goal to support streaming responses. Ensure that all calls toretry_async
have been updated to include thestream
parameter where necessary.185-195: Similarly, the
retry_sync
function has been updated to handle thestream
parameter, ensuring that synchronous retries are compatible with the new streaming functionality. Ensure that all calls toretry_sync
have been updated to include thestream
parameter where necessary.tests/openai/test_multitask.py (1)
- 34-36: The same issue from the previous review comments about missing spaces and a typo in the word "entities" is still present in the new code. This needs to be corrected to ensure proper string formatting and spelling.
- "Correctly segment it into entitites" + "Correctly segment it into entities "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (2)
- docs/concepts/lists.md (2 hunks)
- instructor/patch.py (6 hunks)
Additional comments: 4
docs/concepts/lists.md (2)
65-68:
The type hint forUsers
asIterable[User]
is appropriate for the streaming context described in the documentation.89-94:
The streaming example provided is a simplified demonstration of the new streaming multitask functionality. It is assumed that the reader will understand the context and that a more detailed example will be provided in a more comprehensive documentation section or with additional code comments.instructor/patch.py (2)
145-155: The
stream
parameter is correctly passed to theprocess_response
function within theretry_async
function. This aligns with the new streaming functionality and ensures that the streaming behavior is consistent across asynchronous retries.182-192: Similarly, the
stream
parameter is correctly passed to theprocess_response
function within theretry_sync
function. This ensures that the streaming behavior is consistent across synchronous retries as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome! let me know whenever you'd down to merge.
Looks like coverage fell but thats because the openai tests are not run, no need to mock that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (1)
- docs/concepts/lists.md (4 hunks)
Additional comments: 2
docs/concepts/lists.md (2)
1-4: The title and introduction seem appropriate for the document's content.
13-54: The use of
Iterable
for defining theUsers
class and its implementation in theclient.chat.completions.create
call is correct and follows the Python typing conventions.
|
||
## Streaming Tasks | ||
|
||
Since a `MultiTask(T)` is well contrained to `tasks: List[T]` we can make assuptions on how tokens are used and provide a helper method that allows you generate tasks as the the tokens are streamed in | ||
We can also generate tasks as the tokens are streamed in by defining an `Iterable[T]` type. | ||
|
||
Lets look at an example in action with the same class | ||
|
||
```python hl_lines="6 26" | ||
MultiUser = instructor.MultiTask(User) | ||
from typing import Iterable | ||
|
||
Users = Iterable[User] | ||
|
||
completion = client.chat.completions.create( | ||
users = client.chat.completions.create( | ||
model="gpt-4", | ||
temperature=0.1, | ||
stream=True, | ||
response_model=MultiUser, | ||
response_model=Users, | ||
messages=[ | ||
{ | ||
"role": "system", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [67-100]
The streaming example uses an undefined variable input
, which may lead to confusion. It should be defined or the example should be clarified to indicate that input
is a placeholder for actual data.
max_tokens=1000, | ||
) | ||
|
||
for user in MultiUser.from_streaming_response(completion): | ||
for user in users: | ||
assert isinstance(user, User) | ||
print(user) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The note about streaming being a prototype is important and should be highlighted or made more prominent to ensure users are aware of its experimental status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files ignored due to filter (1)
- pyproject.toml
Files selected for processing (2)
- instructor/function_calls.py (2 hunks)
- tests/openai/test_multitask.py (2 hunks)
Files skipped from review due to trivial changes (1)
- instructor/function_calls.py
Additional comments: 1
tests/openai/test_multitask.py (1)
- 127-133: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [105-133]
Verify that the
MultiUser.from_streaming_response
method is compatible with theMode.FUNCTIONS
parameter as used in thetest_multi_user_legacy
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (2)
- docs/concepts/lists.md (3 hunks)
- instructor/patch.py (6 hunks)
Additional comments: 9
docs/concepts/lists.md (4)
1-4: The introduction is clear and concise, providing a good lead-in to the topic of multi-task and streaming.
67-67: Verify if the markdown renderer supports
hl_lines
for highlighting lines in code blocks, as this feature is not standard in all markdown renderers.79-79: The use of the undefined variable
input
in the example code is still present and should be addressed.91-97: The note about streaming being a prototype should be made more prominent to ensure users are aware of its experimental status.
instructor/patch.py (5)
- 62-62: The error message in the
NotImplementedError
is misleading because the check is for whether theresponse_model
is a subclass ofMultiTaskBase
, not whether it is iterable. The error message should be updated to accurately reflect the condition being checked.- raise NotImplementedError("stream=True is not supported when using response_model parameter for non-iterables") + raise NotImplementedError("stream=True is not supported unless the response_model is a subclass of MultiTaskBase")
- 97-98: The
else
block at lines 97-98 is redundant since all possible values ofmode
are already handled by the precedingif
andelif
statements. If a new mode is added in the future, it should be handled explicitly. Theelse
block can be removed to avoid confusion and potential dead code.97 else: - raise ValueError(f"Invalid patch mode: {mode}")
123-133: The
process_response
function has been updated to handle multitask models and return tasks ifstream=False
. Ensure that the logic for handling the stream and multitask models is correct and that the raw response is returned whenstream
isFalse
.147-157: The
retry_async
function has been updated to include astream
parameter and pass it to theprocess_response
function. Ensure that thestream
parameter is being handled correctly and passed to theprocess_response
function as intended.184-194: The
retry_sync
function has been updated to include astream
parameter and pass it to theprocess_response
function. Ensure that thestream
parameter is being handled correctly and passed to theprocess_response
function as intended.
Resolve #213
Summary by CodeRabbit