Skip to content

Anthropic image format fix #1273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Mar 20, 2025
Merged

Anthropic image format fix #1273

merged 17 commits into from
Mar 20, 2025

Conversation

jk10001
Copy link
Contributor

@jk10001 jk10001 commented Mar 9, 2025

Why are these changes needed?

Reformatting of messages containing images from OpenAI format to Claude format.
Also change system message handling so that message lists can be processed as well as strings.

Related issue number

Checks

jk10001 added 3 commits March 5, 2025 22:26
Reformat OpenAI image content item to Claude format. Also fix an issue processing the system message when the system message content is a list instead of a string.
@CLAassistant
Copy link

CLAassistant commented Mar 9, 2025

CLA assistant check
All committers have signed the CLA.

@davorrunje davorrunje self-assigned this Mar 10, 2025
@davorrunje
Copy link
Collaborator

@marklysze
Copy link
Collaborator

@jk10001 thanks so much for creating this, would you be able to add some tests?
https://github.com/ag2ai/ag2/blob/main/test/oai/test_anthropic.py

@jk10001
Copy link
Contributor Author

jk10001 commented Mar 15, 2025

@marklysze I've added some additional tests into test_anthropic.py.

@marklysze
Copy link
Collaborator

Great, thanks @jk10001, for the tests...

I tried this with the MultimodalConversableAgent with some tweaks in img_utils and it worked well for OCR. Would you have a simple example of how you use this? The reason I ask is it would be good to add to the Anthropic Model page in the docs and for testing.

@jk10001
Copy link
Contributor Author

jk10001 commented Mar 18, 2025

Hi @marklysze, here's a simple example adapting the code in the notebook https://docs.ag2.ai/docs/use-cases/notebooks/notebooks/agentchat_lmm_gpt-4v
I haven't updated docs before, might be a while before I can get into that.

import os
import autogen
from autogen.agentchat.contrib.multimodal_conversable_agent import MultimodalConversableAgent
from dotenv import load_dotenv

load_dotenv()

config_claude_sonnet_37 = [
    {
        "model": "claude-3-7-sonnet-latest",
        "api_key": os.getenv("ANTHROPIC_API_KEY"),
        "api_type": "anthropic"
    },
]

image_agent = MultimodalConversableAgent(
    name="image-explainer",
    max_consecutive_auto_reply=10,
    llm_config={"config_list": config_claude_sonnet_37, "temperature": 0.5, "max_tokens": 300},
)

user_proxy = autogen.UserProxyAgent(
    name="User_proxy",
    system_message="A human admin.",
    human_input_mode="NEVER", 
    max_consecutive_auto_reply=0,
    code_execution_config={
        "use_docker": False
    },  # Please set use_docker=True if docker is available to run the generated code. Using docker is safer than running the generated code directly.
)

# Ask the question with an image
user_proxy.initiate_chat(
    image_agent,
    message="""What's the breed of this dog?
<img https://th.bing.com/th/id/OIP.29Mi2kJmcHHyQVGe_0NG7QHaEo?pid=ImgDet&rs=1>.""",)

@marklysze
Copy link
Collaborator

@jk10001 that's handy, thank you! We'll add that example into the docs separately.

I'm good to move this out of draft for a review if you are.

@jk10001
Copy link
Contributor Author

jk10001 commented Mar 19, 2025

@marklysze good with me, thanks!

@jk10001 jk10001 marked this pull request as ready for review March 19, 2025 21:48
Copy link
Collaborator

@marklysze marklysze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @jk10001, appreciate the enhancement

@marklysze marklysze added this pull request to the merge queue Mar 20, 2025
Merged via the queue into ag2ai:main with commit db2754a Mar 20, 2025
12 checks passed
Copy link

codecov bot commented Mar 20, 2025

Codecov Report

Attention: Patch coverage is 7.31707% with 38 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
autogen/oai/anthropic.py 7.31% 38 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (b2e367c) and HEAD (593a41e). Click for more details.

HEAD has 1265 uploads less than BASE
Flag BASE (b2e367c) HEAD (593a41e)
3.9 82 0
ubuntu-latest 145 1
commsagent-discord 9 0
optional-deps 141 0
core-without-llm 14 1
3.13 85 0
macos-latest 104 0
browser-use 7 0
3.11 64 1
3.12 36 0
commsagent-slack 9 0
3.10 96 0
windows-latest 114 0
commsagent-telegram 9 0
twilio 9 0
retrievechat-pgvector 10 0
interop-langchain 9 0
graph-rag-falkor-db 6 0
jupyter-executor 9 0
retrievechat 15 0
retrievechat-mongodb 10 0
interop 13 0
retrievechat-qdrant 14 0
crawl4ai 13 0
websockets 9 0
docs 6 0
interop-pydantic-ai 9 0
interop-crewai 9 0
cerebras 15 0
agent-eval 1 0
mistral 14 0
teachable 4 0
gpt-assistant-agent 3 0
lmm 4 0
together 14 0
long-context 3 0
retrievechat-couchbase 3 0
llama-index-agent 3 0
websurfer 15 0
gemini 15 0
anthropic 16 0
swarm 14 0
groq 14 0
ollama 15 0
cohere 15 0
bedrock 15 0
integration 12 0
core-llm 8 0
openai-realtime 1 0
captainagent 1 0
autobuild 1 0
deepseek 1 0
neo4j 2 0
falkordb 2 0
gemini-realtime 1 0
Files with missing lines Coverage Δ
autogen/oai/anthropic.py 20.89% <7.31%> (-57.22%) ⬇️

... and 63 files with indirect coverage changes

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants