Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multi-modal RAG evaluation (including images, tables, etc.) #1030

Open
joly-chen opened this issue Jun 17, 2024 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@joly-chen
Copy link

Describe the Feature
Support for multi-modal RAG evaluation (including images, tables, etc.)
Multi-modal RAG is becoming more important.

@joly-chen joly-chen added the enhancement New feature or request label Jun 17, 2024
jjmachan added a commit that referenced this issue Oct 25, 2024
I am a developer from
[ModelScope](https://github.com/modelscope/modelscope). This framework
is great and I would like to add some new features. Multi-modal RAG
evaluation is important, as mentioned in
#1030.

This PR adds support for image-text context RAG evaluation. Currently,
it preliminarily supports MultiModalFaithfulness and MultiModalRelevance
by referring to LlamaIndex (reference:
[faithfulness](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/evaluation/multi_modal/faithfulness.py)
and
[relevancy](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/evaluation/multi_modal/relevancy.py)).
*The current evaluation metrics are still quite preliminary and can be
further improved in the future.*

The usage is as follows:
```python
from ragas.metrics import MultiModalFaithfulness, MultiModalRelevance
from datasets import Dataset
from ragas import evaluate

# load dataset
dataset = Dataset.from_json("outputs/testset_multi_modal.json")

# load metrics
metrics = [MultiModalFaithfulness(), MultiModalRelevance()]

# evaluate
score = evaluate(
    dataset,
    metrics=metrics,
    llm=llm # models with interleaved image-text input, such as gpt-4o
)
score_df = score.to_pandas()
score_df
```
Input example:
```json
[
    {
        "user_input": "What brand is the car in the picture?",
        "retrieved_contexts": [
            "custom_eval/multimodal/images/tesla.jpg",
            "The picture is related to an electric vehicle brand."
        ],
        "response": "Tesla is a car brand.",
        "reference": "The car brand in the picture is Tesla."
    },
    {
        "user_input": "What about the Tesla Model X?",
        "retrieved_contexts": [
            "custom_eval/multimodal/images/tesla.jpg"
        ],
        "response": "Cats are cute.",
        "reference": "The Tesla Model X is an electric SUV manufactured by Tesla."
    }
]
```
Output example:
```json
[
    {
        "user_input": "What brand is the car in the picture?",
        "retrieved_contexts": [
            "custom_eval/multimodal/images/tesla.jpg",
            "The picture is related to an electric vehicle brand."
        ],
        "response": "Tesla is a car brand.",
        "reference": "The car brand in the picture is Tesla.",
        "faithful_rate": true,
        "relevance_rate": true
    },
    {
        "user_input": "What about the Tesla Model X?",
        "retrieved_contexts": [
            "custom_eval/multimodal/images/tesla.jpg"
        ],
        "response": "Cats are cute.",
        "reference": "The Tesla Model X is an electric SUV manufactured by Tesla.",
        "faithful_rate": false,
        "relevance_rate": false
    }
]
```

---------

Co-authored-by: jjmachan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant