You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am a developer from
[ModelScope](https://github.com/modelscope/modelscope). This framework
is great and I would like to add some new features. Multi-modal RAG
evaluation is important, as mentioned in
#1030.
This PR adds support for image-text context RAG evaluation. Currently,
it preliminarily supports MultiModalFaithfulness and MultiModalRelevance
by referring to LlamaIndex (reference:
[faithfulness](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/evaluation/multi_modal/faithfulness.py)
and
[relevancy](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/evaluation/multi_modal/relevancy.py)).
*The current evaluation metrics are still quite preliminary and can be
further improved in the future.*
The usage is as follows:
```python
from ragas.metrics import MultiModalFaithfulness, MultiModalRelevance
from datasets import Dataset
from ragas import evaluate
# load dataset
dataset = Dataset.from_json("outputs/testset_multi_modal.json")
# load metrics
metrics = [MultiModalFaithfulness(), MultiModalRelevance()]
# evaluate
score = evaluate(
dataset,
metrics=metrics,
llm=llm # models with interleaved image-text input, such as gpt-4o
)
score_df = score.to_pandas()
score_df
```
Input example:
```json
[
{
"user_input": "What brand is the car in the picture?",
"retrieved_contexts": [
"custom_eval/multimodal/images/tesla.jpg",
"The picture is related to an electric vehicle brand."
],
"response": "Tesla is a car brand.",
"reference": "The car brand in the picture is Tesla."
},
{
"user_input": "What about the Tesla Model X?",
"retrieved_contexts": [
"custom_eval/multimodal/images/tesla.jpg"
],
"response": "Cats are cute.",
"reference": "The Tesla Model X is an electric SUV manufactured by Tesla."
}
]
```
Output example:
```json
[
{
"user_input": "What brand is the car in the picture?",
"retrieved_contexts": [
"custom_eval/multimodal/images/tesla.jpg",
"The picture is related to an electric vehicle brand."
],
"response": "Tesla is a car brand.",
"reference": "The car brand in the picture is Tesla.",
"faithful_rate": true,
"relevance_rate": true
},
{
"user_input": "What about the Tesla Model X?",
"retrieved_contexts": [
"custom_eval/multimodal/images/tesla.jpg"
],
"response": "Cats are cute.",
"reference": "The Tesla Model X is an electric SUV manufactured by Tesla.",
"faithful_rate": false,
"relevance_rate": false
}
]
```
---------
Co-authored-by: jjmachan <[email protected]>
Describe the Feature
Support for multi-modal RAG evaluation (including images, tables, etc.)
Multi-modal RAG is becoming more important.
The text was updated successfully, but these errors were encountered: