-
-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Enhance multimodality #1070
Conversation
- Upgrade `Image` to support all valid OpenAI and Anthropic vision mime types - Use mime type detection rather than extensions - Add image autodetection for ease of use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 82c6846 in 35 seconds
More details
- Looked at
632
lines of code in7
files - Skipped
0
files when reviewing. - Skipped posting
11
drafted comments based on config settings.
1. instructor/multimodal.py:42
- Draft comment:
Check URL validity before usingPath(source).is_file()
to avoid false positives when the string is a valid URL. - Reason this comment was not posted:
Confidence changes required:80%
Theautodetect
method inImage
class usesPath(source).is_file()
to check if a string is a file path. This can lead to incorrect results if the string is a valid URL that coincidentally matches a file path on the system. It's better to check URL validity first.
2. instructor/multimodal.py:102
- Draft comment:
Handle cases whererequests.head
does not return aContent-Type
header or returns an incorrect one. - Reason this comment was not posted:
Confidence changes required:70%
Thefrom_url
method in theImage
class usesrequests.head
to determine the MIME type ifmimetypes.guess_type
fails. This is a good fallback, but it should handle cases where theContent-Type
header is missing or incorrect.
3. instructor/multimodal.py:64
- Draft comment:
Consider logging a warning inautodetect_safely
when autodetection fails and the source is returned as a string. - Reason this comment was not posted:
Confidence changes required:50%
Theautodetect_safely
method in theImage
class returns the source as a string if autodetection fails. This is a good fallback, but it should log a warning to inform the user that autodetection failed.
4. instructor/multimodal.py:49
- Draft comment:
Include the source value in theValueError
message inautodetect
for better debugging information. - Reason this comment was not posted:
Confidence changes required:50%
Theautodetect
method in theImage
class raises aValueError
if the source is not detected as a valid image. This is appropriate, but the error message could be more informative by including the source value.
5. instructor/multimodal.py:90
- Draft comment:
Include the exception message in theValueError
raised infrom_raw_base64
for better debugging information. - Reason this comment was not posted:
Confidence changes required:50%
Thefrom_raw_base64
method in theImage
class raises aValueError
with a generic message if an exception occurs. It would be more informative to include the exception message in the error message.
6. docs/concepts/multimodal.md:32
- Draft comment:
If this is a newmd
file, ensure it is added tomkdocs.yml
for documentation generation. - Reason this comment was not posted:
Confidence changes required:80%
The PR introduces new functionality related to image handling, including autodetection of image types. This is a significant change that should be documented and tested. The PR includes changes to themultimodal.py
file, which is part of the library code. Therefore, we need to ensure that the documentation and tests are updated accordingly.
7. instructor/multimodal.py:27
- Draft comment:
Theautodetect
method is quite complex. Consider adding comments to explain the logic, especially the decision-making process for determining the image type. - Reason this comment was not posted:
Confidence changes required:80%
The PR introduces new functionality related to image handling, including autodetection of image types. This is a significant change that should be documented and tested. The PR includes changes to themultimodal.py
file, which is part of the library code. Therefore, we need to ensure that the documentation and tests are updated accordingly.
8. instructor/multimodal.py:67
- Draft comment:
Theis_base64
method is used multiple times. Consider adding a comment to explain its purpose and the regex used. - Reason this comment was not posted:
Confidence changes required:80%
The PR introduces new functionality related to image handling, including autodetection of image types. This is a significant change that should be documented and tested. The PR includes changes to themultimodal.py
file, which is part of the library code. Therefore, we need to ensure that the documentation and tests are updated accordingly.
9. instructor/multimodal.py:93
- Draft comment:
Thefrom_url
method is quite complex. Consider adding comments to explain the logic, especially the decision-making process for determining the media type. - Reason this comment was not posted:
Confidence changes required:80%
The PR introduces new functionality related to image handling, including autodetection of image types. This is a significant change that should be documented and tested. The PR includes changes to themultimodal.py
file, which is part of the library code. Therefore, we need to ensure that the documentation and tests are updated accordingly.
10. instructor/multimodal.py:112
- Draft comment:
Thefrom_path
method is quite complex. Consider adding comments to explain the logic, especially the decision-making process for determining the media type and handling file errors. - Reason this comment was not posted:
Confidence changes required:80%
The PR introduces new functionality related to image handling, including autodetection of image types. This is a significant change that should be documented and tested. The PR includes changes to themultimodal.py
file, which is part of the library code. Therefore, we need to ensure that the documentation and tests are updated accordingly.
11. instructor/process_response.py:667
- Draft comment:
Ensure that theconvert_messages
function is tested for the newautodetect_images
parameter. - Reason this comment was not posted:
Confidence changes required:80%
The PR introduces new functionality related to image handling, including autodetection of image types. This is a significant change that should be documented and tested. The PR includes changes to themultimodal.py
file, which is part of the library code. Therefore, we need to ensure that the documentation and tests are updated accordingly.
Workflow ID: wflow_jUy5gxy21mUg8qVT
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
- More ruff
- Fix error match
this looks great but mind cleaning up the diff |
yes, I ruffed the whole repo by mistake lol. fixed now |
Ideally I'd like to be able to support this feature with Anthropic prompt caching more natively as currently this would require some gymnastics. My thought is that we could support a generic content dict such as {"type": "image", "source": <generic_str_source>} and that way users could specify the |
- Optionally cache expensive operations
- Pyright and remove double media type access for Anthropic
oh good idea on the cache key. as long as its well documented im +1 |
- Improve LRU caching
- Use Anthropic `cache_control` syntax and update LRU cache to handle mapping types - Test cache hits and enable / disable
- Ruff
@jxnl let me know what you think. I am not actively using Gemini, but it would be great if we could add Google support at a later date. |
why not use functools.lru_cache looks like a lot of extra code to implement the cache |
- Simplify caching and prompt caching logic - Remove cache config feature for now
Okay I realized that I didn't even need to hash the cache control logic and refactored everything to be much more readable |
... with no response model
- Remove unneeded var assignment
Summary:
I think
multimodal.py
is a huge step in the right direction for Instructor, especially as vision becomes more mainstream. Since every provider handles images uniquely, the hope is that the interface will only need to defineto_<provider>
methods, and theImage
class can handle serialization and routing as needed.Though experimental, I propose image autodetection to streamline vision use cases. Whether image data comes from a path, URL, or base64 encoded string, it should always be possible to automatically figure it out from a source string (bar edge cases where a user can be more explicit if needed). This allows users to specify content as a list of strings, and Instructor can handle the rest by setting
autodetect_images=True
.Change notes:
Image
to support all valid OpenAI and Anthropic vision mime typesImportant
Enhance
Image
class to support autodetection of image sources and update tests and documentation accordingly.Image
class inmultimodal.py
to support autodetection of image sources from URLs, file paths, or base64 strings.autodetect_images
parameter toconvert_messages()
inmultimodal.py
to enable automatic image detection.test_multimodal.py
,test_anthropic/test_multimodal.py
, andtest_openai/test_multimodal.py
forImage
autodetection and conversion functions.base64_jpeg
andbase64_png
intest_multimodal.py
for testing base64 image handling.multimodal.md
to include usage ofautodetect_images
parameter in examples.This description was created by
for 82c6846. It will automatically update as commits are pushed.