Support for multi-modal RAG evaluation (including images, tables, etc.) #1030

joly-chen · 2024-06-17T15:28:36Z

Describe the Feature
Support for multi-modal RAG evaluation (including images, tables, etc.)
Multi-modal RAG is becoming more important.

I am a developer from [ModelScope](https://github.com/modelscope/modelscope). This framework is great and I would like to add some new features. Multi-modal RAG evaluation is important, as mentioned in #1030. This PR adds support for image-text context RAG evaluation. Currently, it preliminarily supports MultiModalFaithfulness and MultiModalRelevance by referring to LlamaIndex (reference: [faithfulness](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/evaluation/multi_modal/faithfulness.py) and [relevancy](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/evaluation/multi_modal/relevancy.py)). *The current evaluation metrics are still quite preliminary and can be further improved in the future.* The usage is as follows: ```python from ragas.metrics import MultiModalFaithfulness, MultiModalRelevance from datasets import Dataset from ragas import evaluate # load dataset dataset = Dataset.from_json("outputs/testset_multi_modal.json") # load metrics metrics = [MultiModalFaithfulness(), MultiModalRelevance()] # evaluate score = evaluate( dataset, metrics=metrics, llm=llm # models with interleaved image-text input, such as gpt-4o ) score_df = score.to_pandas() score_df ``` Input example: ```json [ { "user_input": "What brand is the car in the picture?", "retrieved_contexts": [ "custom_eval/multimodal/images/tesla.jpg", "The picture is related to an electric vehicle brand." ], "response": "Tesla is a car brand.", "reference": "The car brand in the picture is Tesla." }, { "user_input": "What about the Tesla Model X?", "retrieved_contexts": [ "custom_eval/multimodal/images/tesla.jpg" ], "response": "Cats are cute.", "reference": "The Tesla Model X is an electric SUV manufactured by Tesla." } ] ``` Output example: ```json [ { "user_input": "What brand is the car in the picture?", "retrieved_contexts": [ "custom_eval/multimodal/images/tesla.jpg", "The picture is related to an electric vehicle brand." ], "response": "Tesla is a car brand.", "reference": "The car brand in the picture is Tesla.", "faithful_rate": true, "relevance_rate": true }, { "user_input": "What about the Tesla Model X?", "retrieved_contexts": [ "custom_eval/multimodal/images/tesla.jpg" ], "response": "Cats are cute.", "reference": "The Tesla Model X is an electric SUV manufactured by Tesla.", "faithful_rate": false, "relevance_rate": false } ] ``` --------- Co-authored-by: jjmachan <[email protected]>

joly-chen added the enhancement New feature or request label Jun 17, 2024

Yunnglin mentioned this issue Oct 23, 2024

Feat: add multimodal eval support #1559

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for multi-modal RAG evaluation (including images, tables, etc.) #1030

Support for multi-modal RAG evaluation (including images, tables, etc.) #1030

joly-chen commented Jun 17, 2024

Support for multi-modal RAG evaluation (including images, tables, etc.) #1030

Support for multi-modal RAG evaluation (including images, tables, etc.) #1030

Comments

joly-chen commented Jun 17, 2024