Feat: add multimodal eval support #1559

Yunnglin · 2024-10-23T09:18:53Z

I am a developer from ModelScope. This framework is great and I would like to add some new features. Multi-modal RAG evaluation is important, as mentioned in #1030.

This PR adds support for image-text context RAG evaluation. Currently, it preliminarily supports MultiModalFaithfulness and MultiModalRelevance by referring to LlamaIndex (reference: faithfulness and relevancy). The current evaluation metrics are still quite preliminary and can be further improved in the future.

The usage is as follows:

from ragas.metrics import MultiModalFaithfulness, MultiModalRelevance
from datasets import Dataset
from ragas import evaluate

# load dataset
dataset = Dataset.from_json("outputs/testset_multi_modal.json")

# load metrics
metrics = [MultiModalFaithfulness(), MultiModalRelevance()]

# evaluate
score = evaluate(
    dataset,
    metrics=metrics,
    llm=llm # models with interleaved image-text input, such as gpt-4o
)
score_df = score.to_pandas()
score_df

Input example:

[
    {
        "user_input": "What brand is the car in the picture?",
        "retrieved_contexts": [
            "custom_eval/multimodal/images/tesla.jpg",
            "The picture is related to an electric vehicle brand."
        ],
        "response": "Tesla is a car brand.",
        "reference": "The car brand in the picture is Tesla."
    },
    {
        "user_input": "What about the Tesla Model X?",
        "retrieved_contexts": [
            "custom_eval/multimodal/images/tesla.jpg"
        ],
        "response": "Cats are cute.",
        "reference": "The Tesla Model X is an electric SUV manufactured by Tesla."
    }
]

Output example:

[
    {
        "user_input": "What brand is the car in the picture?",
        "retrieved_contexts": [
            "custom_eval/multimodal/images/tesla.jpg",
            "The picture is related to an electric vehicle brand."
        ],
        "response": "Tesla is a car brand.",
        "reference": "The car brand in the picture is Tesla.",
        "faithful_rate": true,
        "relevance_rate": true
    },
    {
        "user_input": "What about the Tesla Model X?",
        "retrieved_contexts": [
            "custom_eval/multimodal/images/tesla.jpg"
        ],
        "response": "Cats are cute.",
        "reference": "The Tesla Model X is an electric SUV manufactured by Tesla.",
        "faithful_rate": false,
        "relevance_rate": false
    }
]

jjmachan · 2024-10-23T14:55:12Z

hey @Yunnglin I had a quick look at this and it's great - thanks alot for contributing it in ❤️

testing it on my end too and will merge it in. I also see there are a couple of type check error, will you be tackling them or should I help you (happy to 🙂)

Yunnglin · 2024-10-24T03:55:32Z

Hello, I have corrected these errors. Could you please recheck them?

shahules786 · 2024-10-24T04:48:06Z

Hey @Yunnglin this seems great. We could improve the method for calculating faithfulness later on if required. It would be great if you can add these two the docs as well. It would go under RAG section - https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/
A small description of both would be perfect. Let me know if you need help with this.

Yunnglin · 2024-10-24T06:54:26Z

I have added the relevant documents. Could you please take a look and see if any modifications are needed?

shahules786

LGTM

jjmachan · 2024-10-25T13:07:59Z

just made some small fixes for callbacks support

jjmachan · 2024-10-25T16:19:27Z

thanks a lot @Yunnglin for the PR - made a couple of small tweaks to merge it in but looks great ❤️

btw we have a form for goodies do check it out 🙂 https://docs.google.com/forms/d/e/1FAIpQLSdM9FrrZrnpByG4XxuTbcAB-zn-Z7i_a7CsMkgBVOWQjRJckg/viewform

ethanelasky · 2024-12-06T05:07:58Z

Hi, nice work! Is it possible to add base64 image support as well (to mirror how anthropic/openai-compatible models accept images)?

simjak · 2025-01-21T07:35:17Z

base64 would be very useful @jjmachan any examples of how I can evaluate multimodal retrieval?
https://mragbench.github.io/

jjmachan · 2025-01-22T00:55:07Z

hey @simjak I will take a look at this and let you know :)

btw are you in discord?

add multimodal eval support

13fdb77

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Oct 23, 2024

Yunnglin changed the title ~~Add multimodal eval support~~ Feat: add multimodal eval support Oct 23, 2024

jjmachan requested a review from shahules786 October 23, 2024 14:52

fix type errors

def462a

update docs

fa851d1

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Oct 24, 2024

shahules786 requested a review from jjmachan October 25, 2024 04:05

shahules786 approved these changes Oct 25, 2024

View reviewed changes

feat: pass callbacks to scores

30a6649

jjmachan added 3 commits October 25, 2024 18:44

style: ci fixes

a992532

Merge branch 'main' into feat/multimodal_support

a2b908a

chore: merge with main

65359c5

jjmachan merged commit 0f412de into explodinggradients:main Oct 25, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: add multimodal eval support #1559

Feat: add multimodal eval support #1559

Yunnglin commented Oct 23, 2024

jjmachan commented Oct 23, 2024

Yunnglin commented Oct 24, 2024

shahules786 commented Oct 24, 2024

Yunnglin commented Oct 24, 2024

shahules786 left a comment

jjmachan commented Oct 25, 2024

jjmachan commented Oct 25, 2024

ethanelasky commented Dec 6, 2024 •

edited

Loading

simjak commented Jan 21, 2025 •

edited

Loading

jjmachan commented Jan 22, 2025

Feat: add multimodal eval support #1559

Feat: add multimodal eval support #1559

Conversation

Yunnglin commented Oct 23, 2024

jjmachan commented Oct 23, 2024

Yunnglin commented Oct 24, 2024

shahules786 commented Oct 24, 2024

Yunnglin commented Oct 24, 2024

shahules786 left a comment

Choose a reason for hiding this comment

jjmachan commented Oct 25, 2024

jjmachan commented Oct 25, 2024

ethanelasky commented Dec 6, 2024 • edited Loading

simjak commented Jan 21, 2025 • edited Loading

jjmachan commented Jan 22, 2025

ethanelasky commented Dec 6, 2024 •

edited

Loading

simjak commented Jan 21, 2025 •

edited

Loading