Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Improve AI Photo Captions with Optional Context Input or Metadata #7

Closed
Scout2022 opened this issue Nov 22, 2024 · 5 comments · Fixed by #16
Closed

[Feature] Improve AI Photo Captions with Optional Context Input or Metadata #7

Scout2022 opened this issue Nov 22, 2024 · 5 comments · Fixed by #16
Assignees
Labels
enhancement New feature or request

Comments

@Scout2022
Copy link

Would it be possible to add an optional text box to the AI photo recognition process? Users could type in extra details like location names or words related to the photo's content to help the AI understand the image better. This would make the captions more accurate, especially for ambiguous images. This text box could be added either before or after the AI processes the photo, allowing users to provide context upfront or correct mistakes afterward by resending to AI.

Alternatively, the plugin could automatically extract location information from the photo's metadata, if available. This would provide context to the AI without requiring manual input from the user.

@manuelcapellari
Copy link

As an alternative to an optional text box, it would be useful to transmit existing metadata (existing keywords, geoinformation, etc.), since it is not possible, at least in the case of Gemini, to extract this information directly from the image

@bmachek
Copy link
Owner

bmachek commented Dec 21, 2024

As an alternative to an optional text box, it would be useful to transmit existing metadata (existing keywords, geoinformation, etc.), since it is not possible, at least in the case of Gemini, to extract this information directly from the image

This sounds like a better plan, since it would require less user interaction while the plugin is running. And most importantly it would improve my personal workflow. ;-)
Will give it a try soon... :-)

@bmachek bmachek self-assigned this Dec 21, 2024
@bmachek bmachek added the enhancement New feature or request label Dec 21, 2024
@bmachek
Copy link
Owner

bmachek commented Dec 21, 2024

As it turned out I tried it immediately... ;-)

In the referenced git branch is a candidate that transmits the GPS coordinates from the analyzed photo to Gemini. Results vary from being astonishing specific and correct to being very general. If checking out a branch is possible for you, you might want to give it a try..

to be continued

@manuelcapellari
Copy link

In the referenced git branch is a candidate that transmits the GPS coordinates from the analyzed photo to Gemini. Results vary from being astonishing specific and correct to being very general. If checking out a branch is possible for you, you might want to give it a try..

incredible!

I have been following the development of LLMs for a long time, I believe that time is working for us here as far as the quality of the results is concerned, the development has increased massively in speed in the last 2 years

@bmachek
Copy link
Owner

bmachek commented Dec 22, 2024

Thanks. I added support for sending pre-existing keywords with the request. It doesn't affect the results much.
Happy holidays! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants