fix: Refactor reasking logic #1071

jxnl · 2024-10-12T02:59:39Z

Important

Refactor reasking and retry logic by introducing reask.py for mode-specific handling and updating retry.py for improved exception handling and modularity.

Reask Logic:
- Introduced reask.py with functions like reask_anthropic_tools, reask_cohere_tools, reask_gemini_tools for mode-specific reasking.
- Added handle_reask_kwargs() to select appropriate reask function based on mode.
Retry Logic:
- Refactored retry_sync() and retry_async() in retry.py to use handle_reask_kwargs() for exception handling.
- Added initialize_retrying() and initialize_usage() to set up retrying and usage tracking.
Exceptions:
- Updated InstructorRetryException in exceptions.py to include create_kwargs.
Misc:
- Minor changes in patch.py and templating.py for logging and templating adjustments.
- Fixed JSON extraction logic in iterable.py and partial.py.

^{This description was created by}^{for 28f7513. It will automatically update as commits are pushed.}

ellipsis-dev

👍 Looks good to me! Reviewed everything up to d56bc5e in 27 seconds

More details

Looked at 800 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 10 drafted comments based on config settings.

1. instructor/reask.py:24

Draft comment:
Unnecessary copying of kwargs. Consider modifying kwargs directly unless there's a specific reason to keep the original unchanged. This comment applies to other similar instances in this file.
Reason this comment was not posted:
Confidence changes required: 50%
The code in reask.py and retry.py has several instances where kwargs is copied unnecessarily. This can lead to performance issues due to unnecessary memory usage.

2. instructor/reask.py:71

Draft comment:
Unnecessary copying of kwargs. Consider modifying kwargs directly unless there's a specific reason to keep the original unchanged. This comment applies to other similar instances in this file.
Reason this comment was not posted:
Confidence changes required: 50%
The code in reask.py and retry.py has several instances where kwargs is copied unnecessarily. This can lead to performance issues due to unnecessary memory usage.

3. instructor/reask.py:153

Draft comment:
Unnecessary copying of kwargs. Consider modifying kwargs directly unless there's a specific reason to keep the original unchanged. This comment applies to other similar instances in this file.
Reason this comment was not posted:
Confidence changes required: 50%
The code in reask.py and retry.py has several instances where kwargs is copied unnecessarily. This can lead to performance issues due to unnecessary memory usage.

4. instructor/reask.py:169

Draft comment:
Unnecessary copying of kwargs. Consider modifying kwargs directly unless there's a specific reason to keep the original unchanged. This comment applies to other similar instances in this file.
Reason this comment was not posted:
Confidence changes required: 50%
The code in reask.py and retry.py has several instances where kwargs is copied unnecessarily. This can lead to performance issues due to unnecessary memory usage.

5. instructor/reask.py:192

Draft comment:
Unnecessary copying of kwargs. Consider modifying kwargs directly unless there's a specific reason to keep the original unchanged. This comment applies to other similar instances in this file.
Reason this comment was not posted:
Confidence changes required: 50%
The code in reask.py and retry.py has several instances where kwargs is copied unnecessarily. This can lead to performance issues due to unnecessary memory usage.

6. instructor/reask.py:214

Draft comment:
Unnecessary copying of kwargs. Consider modifying kwargs directly unless there's a specific reason to keep the original unchanged. This comment applies to other similar instances in this file.
Reason this comment was not posted:
Confidence changes required: 50%
The code in reask.py and retry.py has several instances where kwargs is copied unnecessarily. This can lead to performance issues due to unnecessary memory usage.

7. instructor/reask.py:236

Draft comment:
Unnecessary copying of kwargs. Consider modifying kwargs directly unless there's a specific reason to keep the original unchanged. This comment applies to other similar instances in this file.
Reason this comment was not posted:
Confidence changes required: 50%
The code in reask.py and retry.py has several instances where kwargs is copied unnecessarily. This can lead to performance issues due to unnecessary memory usage.

8. instructor/reask.py:253

Draft comment:
Unnecessary copying of kwargs. Consider modifying kwargs directly unless there's a specific reason to keep the original unchanged. This comment applies to other similar instances in this file.
Reason this comment was not posted:
Confidence changes required: 50%
The code in reask.py and retry.py has several instances where kwargs is copied unnecessarily. This can lead to performance issues due to unnecessary memory usage.

9. instructor/reask.py:27

Draft comment:
Assertions should always have an error message. Please provide a descriptive error message for this assertion.

assert isinstance(response, Message), "Response must be a valid Anthropic Message instance."

Reason this comment was not posted:
Confidence changes required: 80%
The code has multiple instances where assertions lack error messages. Assertions should always have a clear error message to aid in debugging.

10. instructor/reask.py:74

Draft comment:
Assertions should always have an error message. Please provide a descriptive error message for this assertion.

assert isinstance(response, Message), "Response must be a valid Anthropic Message instance."

Reason this comment was not posted:
Confidence changes required: 80%
The code has multiple instances where assertions lack error messages. Assertions should always have a clear error message to aid in debugging.

Workflow ID: wflow_1PWK3csLQP2DGuCQ

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 15a5cb8 in 15 seconds

More details

Looked at 60 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. instructor/retry.py:99

Draft comment:
Good use of Python 3.10+ syntax with type[T_Model] | None. Ensure this is consistent across the codebase.
Reason this comment was not posted:
Confidence changes required: 20%
The change from Optional[Type] to Type | None is a good update for Python 3.10+ syntax, but it should be consistent across the file.

2. instructor/retry.py:178

Draft comment:
Good use of Python 3.10+ syntax with type[T_Model] | None. Ensure this is consistent across the codebase.
Reason this comment was not posted:
Confidence changes required: 20%
The change from Optional[Type] to Type | None is a good update for Python 3.10+ syntax, but it should be consistent across the file.

3. instructor/retry.py:97

Draft comment:
Function names retry_sync and retry_async should follow a consistent naming pattern, such as initialize_retrying and initialize_usage. Consider renaming them to initialize_retry_sync and initialize_retry_async.
Reason this comment was not posted:
Confidence changes required: 80%
The function names retry_sync and retry_async are not following a consistent naming pattern with other functions in the file, such as initialize_retrying and initialize_usage.

Workflow ID: wflow_MJAqbw421Q7boybS

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 1f3e3fe in 12 seconds

More details

Looked at 15 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. instructor/reask.py:88

Draft comment:
The attempt_number parameter was moved to the end and given a default value. Ensure that this change does not affect any existing calls to reask_cohere_tools that might have used positional arguments.
Reason this comment was not posted:
Confidence changes required: 50%
The attempt_number parameter in reask_cohere_tools is set to a default value of 1, which is a good practice for optional parameters. However, the placement of this parameter was changed, which might affect the function's usage if it was called with positional arguments.

2. instructor/reask.py:84

Draft comment:
Function names should follow a consistent pattern. Consider renaming functions to maintain consistency, such as reask_<platform>_<type> or reask_<platform>. This applies to other functions in this file as well.
Reason this comment was not posted:
Confidence changes required: 80%
The function names in this file are inconsistent. Some use the pattern reask_<platform>_<type> while others use reask_<platform>. This inconsistency can lead to confusion.

Workflow ID: wflow_xr3WKTLZVoLC94HA

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

cloudflare-workers-and-pages · 2024-10-12T08:41:05Z

Deploying instructor-py with Cloudflare Pages

Latest commit:	`28f7513`
Status:	✅ Deploy successful!
Preview URL:	https://3e4407d0.instructor-py.pages.dev
Branch Preview URL:	https://refactor-reask.instructor-py.pages.dev

View logs

ellipsis-dev

👍 Looks good to me! Incremental review on 8f789a4 in 18 seconds

More details

Looked at 68 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 5 drafted comments based on config settings.

1. instructor/retry.py:102

Draft comment:
The addition of the context parameter is consistent with the refactoring goals. Ensure that process_response and process_response_async handle this parameter correctly.
Reason this comment was not posted:
Confidence changes required: 50%
The PR introduces a new parameter context in the retry_sync and retry_async functions, which is used in the process_response and process_response_async functions. This change aligns with the refactoring described in the PR description.

2. instructor/retry.py:155

Draft comment:
Ensure handle_reask_kwargs is correctly handling the new context parameter if applicable. This is crucial for the refactored reasking logic.
Reason this comment was not posted:
Confidence changes required: 50%
The handle_reask_kwargs function is used in both retry_sync and retry_async functions. The refactoring seems to focus on improving exception handling and modularity, which is consistent with the use of this function.

3. instructor/retry.py:228

Draft comment:
The addition of the context parameter in retry_async is consistent with the refactoring goals. Ensure that process_response_async handles this parameter correctly.
Reason this comment was not posted:
Confidence changes required: 50%
The retry_sync and retry_async functions have been refactored to include the context parameter, which is used in the process_response and process_response_async functions. This aligns with the refactoring goals mentioned in the PR description.

4. instructor/retry.py:179

Draft comment:
The error message in the InstructorRetryException should be formatted for clarity. Consider using a consistent format for error messages throughout the codebase.
Reason this comment was not posted:
Confidence changes required: 80%
The error messages in assertions should be formatted well for clarity and consistency.

5. instructor/retry.py:225

Draft comment:
The error message in the InstructorRetryException should be formatted for clarity. Consider using a consistent format for error messages throughout the codebase.
Reason this comment was not posted:
Confidence changes required: 80%
The error messages in assertions should be formatted well for clarity and consistency.

Workflow ID: wflow_3WHaseC8h2heyY7f

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on d2c2f06 in 12 seconds

More details

Looked at 10 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. instructor/reask.py:288

Draft comment:
Copying kwargs is unnecessary if not modified outside the function. Consider removing kwargs = kwargs.copy() for performance improvement. This applies to other functions as well.
Reason this comment was not posted:
Confidence changes required: 50%
The code has a consistent pattern of copying the kwargs dictionary at the start of each function. This is unnecessary if the kwargs are not modified in a way that affects the original dictionary outside the function. It can be removed to improve performance.

2. instructor/reask.py:288

Draft comment:
Assertions should always have an error message that is formatted well. Ensure that all assertions in the codebase, including those in other functions, have descriptive error messages.
Reason this comment was not posted:
Confidence changes required: 80%
The code has multiple instances of assertions without error messages.

Workflow ID: wflow_m6pkTtz2erkQZTdh

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on d8d521f in 21 seconds

More details

Looked at 36 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 4 drafted comments based on config settings.

1. instructor/reask.py:92

Draft comment:
Consider using kwargs.setdefault("chat_history", []).append(...) to simplify the code and avoid unnecessary reassignment.
Reason this comment was not posted:
Confidence changes required: 80%
The code in reask.py and templating.py has some issues that need to be addressed. In reask.py, the reask_cohere_tools function has a potential issue with handling chat_history. In templating.py, the handle_templating function has a logical error in handling new_kwargs when it is a list.

2. instructor/templating.py:91

Draft comment:
Return an empty dictionary instead of None to maintain consistency in return type.
Reason this comment was not posted:
Confidence changes required: 80%
The handle_templating function in templating.py has a logical error when new_kwargs is a list. The function should return an empty dictionary instead of None to maintain consistency in return type.

3. instructor/templating.py:83

Draft comment:
Function names should follow a consistent naming pattern across the codebase. Consider using snake_case for all function names for consistency.
Reason this comment was not posted:
Confidence changes required: 80%
The function reask_cohere_tools in reask.py and handle_templating in templating.py have inconsistent naming conventions. The former uses snake_case while the latter uses camelCase. Consistent naming conventions should be followed across the codebase.

4. instructor/reask.py:91

Draft comment:
Good refactoring to avoid repetition by using get and append methods for chat_history. This adheres to the DRY principle.
Reason this comment was not posted:
Confidence changes required: 0%
The code in reask_cohere_tools was refactored to avoid repetition by using get and append methods. This change adheres to the DRY principle.

Workflow ID: wflow_HivP1NIYTqEGVJ7f

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

❌ Changes requested. Incremental review on f7a6659 in 23 seconds

More details

Looked at 60 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. instructor/templating.py:104

Draft comment:
The handle_templating function should return a consistent type. Currently, it may return None if messages is not found, which can lead to unexpected behavior. Consider returning new_kwargs instead.
Reason this comment was not posted:
Confidence changes required: 80%
The handle_templating function should return a consistent type. Currently, it may return None in some cases, which can lead to unexpected behavior.

2. instructor/templating.py:13

Draft comment:
The process_message function returns different types based on conditions (e.g., gm.Content or None). Consider having a consistent return type or document the behavior clearly.
Reason this comment was not posted:
Confidence changes required: 80%
The function process_message is returning different types based on conditions, which can be confusing. It should have a consistent return type or be documented if it intentionally returns different types.

3. instructor/templating.py:112

Draft comment:
The process_message function is used here, but its return value is not consistently utilized across different usages. Ensure consistent handling of its return value or document the intended behavior.
Reason this comment was not posted:
Confidence changes required: 80%
The process_message function is used in multiple places, and its return value is being used in some cases but not others. This inconsistency should be addressed or documented.

Workflow ID: wflow_7dlNMj6YYMRMWGWO

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2024-10-12T15:25:36Z

instructor/templating.py

@@ -48,7 +53,7 @@ def process_message(message: dict[str, Any], context: dict[str, Any]) -> None:
            apply_template(part, context) if isinstance(part, str) else part
            for part in message["parts"]
        ]
-        return
+        return message


The function process_message is expected to return None, but it returns a value in some cases (e.g., for Gemini format). This inconsistency can lead to unexpected behavior. Consider standardizing the return type.

ellipsis-dev

👍 Looks good to me! Incremental review on 6f4f069 in 15 seconds

More details

Looked at 27 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. instructor/dsl/iterable.py:95

Draft comment:
Avoid using # type:ignore without a clear reason. Consider addressing the underlying type issue instead.
Reason this comment was not posted:
Confidence changes required: 50%
The type:ignore comment is used to suppress type checking errors, but it should be used sparingly and with a clear reason. In this case, it seems to be used to ignore a potential type mismatch, but it's not clear why it's necessary.

2. instructor/dsl/partial.py:166

Draft comment:
Avoid using # type:ignore without a clear reason. Consider addressing the underlying type issue instead.
Reason this comment was not posted:
Confidence changes required: 50%
The type:ignore comment is used to suppress type checking errors, but it should be used sparingly and with a clear reason. In this case, it seems to be used to ignore a potential type mismatch, but it's not clear why it's necessary.

3. instructor/dsl/iterable.py:93

Draft comment:
Add a comment explaining why # type:ignore is necessary here and in partial.py on line 166 to improve code clarity.
Reason this comment was not posted:
Confidence changes required: 60%
The # type:ignore comments are used without explanation, which can be confusing for future developers.

Workflow ID: wflow_P6ZhQft4WUHu24Tp

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ivanleomk · 2024-10-12T15:45:04Z

Vertex AI Tests

tests/llm/test_vertexai/test_format.py ....                                                                                        [ 14%]
tests/llm/test_vertexai/test_message_parser.py ....                                                                                [ 28%]
tests/llm/test_vertexai/test_modes.py ......                                                                                       [ 50%]
tests/llm/test_vertexai/test_retries.py ....                                                                                       [ 64%]
tests/llm/test_vertexai/test_simple_types.py ......                                                                                [ 85%]
tests/llm/test_vertexai/test_stream.py ....                                                                                        [100%]

Gemini Tests

tests/llm/test_gemini/evals/test_classification_enums.py ..........                                                                [ 15%]
tests/llm/test_gemini/evals/test_classification_literals.py ..........                                                             [ 31%]
tests/llm/test_gemini/evals/test_entities.py ..                                                                                    [ 34%]
tests/llm/test_gemini/evals/test_extract_users.py ......                                                                           [ 44%]
tests/llm/test_gemini/evals/test_sentiment_analysis.py ......                                                                      [ 53%]
tests/llm/test_gemini/test_format.py ....                                                                                          [ 60%]
tests/llm/test_gemini/test_list_content.py .                                                                                       [ 61%]
tests/llm/test_gemini/test_modes.py ....                                                                                           [ 68%]
tests/llm/test_gemini/test_multimodal_content.py ..                                                                                [ 71%]
tests/llm/test_gemini/test_patch.py ....                                                                                           [ 77%]
tests/llm/test_gemini/test_retries.py ....                                                                                         [ 84%]
tests/llm/test_gemini/test_roles.py .                                                                                              [ 85%]
tests/llm/test_gemini/test_simple_types.py ...                                                                                     [ 90%]
tests/llm/test_gemini/test_stream.py ......                                                                                        [100%]

Cerebras Tests

tests/llm/test_cerebras/modes.py ...........                                                                                       [100%]

Cohere Tests

tests/llm/test_cohere/test_json_schema.py ..........                             [ 62%]
tests/llm/test_cohere/test_none_response.py ..                                   [ 75%]
tests/llm/test_cohere/test_retries.py ....                                       [100%

ellipsis-dev

👍 Looks good to me! Incremental review on 28f7513 in 13 seconds

More details

Looked at 40 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. instructor/reask.py:278

Draft comment:
Remove redundant comment for clarity.
Reason this comment was not posted:
Confidence changes required: 10%
The comment on line 278 is redundant and can be removed for clarity.

2. instructor/reask.py:278

Draft comment:
The comment can be more concise:

        Mode.COHERE_JSON_SCHEMA: reask_cohere_tools,  # Same function

Reason this comment was not posted:
Confidence changes required: 50%
The comment on line 278 is not concise and can be improved by removing unnecessary words.

Workflow ID: wflow_weNtQAQRc4Yfn0dC

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

jxnl added 2 commits October 11, 2024 22:37

initial reask update

fa5af48

fix exceptions

d56bc5e

jxnl requested a review from ivanleomk October 12, 2024 02:59