[bug] Contradictory Arguments with Validation in `TopicAdherenceScore` #1564

luqmansen · 2024-10-23T15:34:29Z

I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug

I'm trying the exact example from your documentation for topic adherence evaluation, except I changed the score function from async to sync. In the MultiTurnMetric.multi_turn_score, the function will strip off the reference_topics from the MultiTurnSample instance argument I have given (L335).

ragas/src/ragas/metrics/base.py

Lines 324 to 335 in 0a7ca62

    
               def multi_turn_score( 
        
                   self, 
        
                   sample: MultiTurnSample, 
        
                   callbacks: Callbacks = None, 
        
               ) -> float: 
        
                   """ 
        
                   Score a multi-turn conversation sample synchronously. 
        
                   May raise ImportError if nest_asyncio is not installed in Jupyter-like environments. 
        
                   """ 
        
                   callbacks = callbacks or [] 
        
                   sample = self._only_required_columns_multi_turn(sample)

This will makes the type validation in TopicAdherenceScore._multi_turn_ascore failed (l150-152) because the _required_column for this class in only user_input(L137-138)

ragas/src/ragas/metrics/_topic_adherence.py

Lines 135 to 152 in 0a7ca62

    
           class TopicAdherenceScore(MetricWithLLM, MultiTurnMetric): 
        
               name: str = "topic_adherence"  # type: ignore 
        
               _required_columns: t.Dict[MetricType, t.Set[str]] = field( 
        
                   default_factory=lambda: {MetricType.MULTI_TURN: {"user_input"}} 
        
               ) 
        
               mode: t.Literal["precision", "recall", "f1"] = "f1" 
        
               topic_extraction_prompt: PydanticPrompt = TopicExtractionPrompt() 
        
               topic_classification_prompt: PydanticPrompt = TopicClassificationPrompt() 
        
               topic_refused_prompt: PydanticPrompt = TopicRefusedPrompt() 
        
               async def _multi_turn_ascore( 
        
                   self, sample: MultiTurnSample, callbacks: Callbacks 
        
               ) -> float: 
        
                   assert self.llm is not None, "LLM must be set" 
        
                   assert isinstance(sample.user_input, list), "Sample user_input must be a list" 
        
                   assert isinstance( 
        
                       sample.reference_topics, list 
        
                   ), "Sample reference_topics must be a list"

Ragas version: 0.2.2
Python version: 3.9

Code to Reproduce
Share code to reproduce the issue

from ragas.dataset_schema import  SingleTurnSample, MultiTurnSample, EvaluationDataset
from ragas.messages import HumanMessage,AIMessage,ToolMessage,ToolCall
from ragas.metrics import TopicAdherenceScore


sample_input_4 = [
    HumanMessage(content="Can you provide me with details about Einstein's theory of relativity?"),
    AIMessage(content="Sure, let me retrieve the relevant information for you.", tool_calls=[
        ToolCall(name="document_search", args={"query": "Einstein's theory of relativity"})
    ]),
    ToolMessage(content="Found relevant documents: 1. Relativity: The Special and the General Theory, 2. General Theory of Relativity by A. Einstein."),
    AIMessage(content="I found some documents on Einstein's theory of relativity. Which one would you like to know more about: 'Relativity: The Special and the General Theory' or 'General Theory of Relativity by A. Einstein'?"),
    HumanMessage(content="Tell me about the 'General Theory of Relativity'."),
    AIMessage(content="Got it! Let me fetch more details from 'General Theory of Relativity by A. Einstein'.", tool_calls=[
        ToolCall(name="document_retrieve", args={"document": "General Theory of Relativity by A. Einstein"})
    ]),
    ToolMessage(content="The document discusses how gravity affects the fabric of spacetime, describing the relationship between mass and spacetime curvature."),
    AIMessage(content="The 'General Theory of Relativity' explains how gravity affects the fabric of spacetime and the relationship between mass and spacetime curvature. Would you like more details or a specific explanation?"),
    HumanMessage(content="No, that's perfect. By the way, do you know any good recipes for a chocolate cake?"),
    AIMessage(content="Sure! Let me find a simple and delicious recipe for a chocolate cake.", tool_calls=[
        ToolCall(name="recipe_search", args={"query": "chocolate cake recipe"})
    ]),
    ToolMessage(content="Here’s a popular recipe for a chocolate cake: Ingredients include flour, sugar, cocoa powder, eggs, milk, and butter. Instructions: Mix dry ingredients, add wet ingredients, and bake at 350°F for 30-35 minutes."),
    AIMessage(content="I found a great recipe for chocolate cake! Would you like the full details, or is that summary enough?")
]


sample = MultiTurnSample(user_input=sample_input_4, reference_topics=list(["science"]))
scorer = TopicAdherenceScore(mode="precision", llm=azure_llm) # use your own LLM
scorer.multi_turn_score(sample)

Error trace

AssertionError                            Traceback (most recent call last)
Cell In[24], line 32
     30 sample = MultiTurnSample(user_input=sample_input_4, reference_topics=sample_reference)
     31 scorer = TopicAdherenceScore(mode="precision", llm=azure_llm)
---> 32 scorer.multi_turn_score(sample)

File ~/.local/share/virtualenvs/-fqIfThIg/lib/python3.9/site-packages/ragas/metrics/base.py:359, in MultiTurnMetric.multi_turn_score(self, sample, callbacks)
    357     if not group_cm.ended:
    358         rm.on_chain_error(e)
--> 359     raise e
    360 else:
    361     if not group_cm.ended:

File ~/.local/share/virtualenvs/-fqIfThIg/lib/python3.9/site-packages/ragas/metrics/base.py:353, in MultiTurnMetric.multi_turn_score(self, sample, callbacks)
    349             raise ImportError(
    350                 "It seems like your running this in a jupyter-like environment. Please install nest_asyncio with `pip install nest_asyncio` to make it work."
    351             )
    352     loop = asyncio.get_event_loop()
--> 353     score = loop.run_until_complete(
    354         self._multi_turn_ascore(sample=sample, callbacks=group_cm)
    355     )
    356 except Exception as e:
    357     if not group_cm.ended:

File ~/.local/share/virtualenvs/-fqIfThIg/lib/python3.9/site-packages/nest_asyncio.py:98, in _patch_loop.<locals>.run_until_complete(self, future)
     95 if not f.done():
     96     raise RuntimeError(
     97         'Event loop stopped before Future completed.')
---> 98 return f.result()

File ~/.pyenv/versions/3.9.16/lib/python3.9/asyncio/futures.py:201, in Future.result(self)
    199 self.__log_traceback = False
    200 if self._exception is not None:
--> 201     raise self._exception
    202 return self._result

File ~/.pyenv/versions/3.9.16/lib/python3.9/asyncio/tasks.py:256, in Task.__step(***failed resolving arguments***)
    252 try:
    253     if exc is None:
    254         # We use the `send` method directly, because coroutines
    255         # don't have `__iter__` and `__next__` methods.
--> 256         result = coro.send(None)
    257     else:
    258         result = coro.throw(exc)

File ~/.local/share/virtualenvs/-fqIfThIg/lib/python3.9/site-packages/ragas/metrics/_topic_adherence.py:150, in TopicAdherenceScore._multi_turn_ascore(self, sample, callbacks)
    148 assert self.llm is not None, "LLM must be set"
    149 assert isinstance(sample.user_input, list), "Sample user_input must be a list"
--> 150 assert isinstance(
    151     sample.reference_topics, list
    152 ), "Sample reference_topics must be a list"
    153 user_input = sample.pretty_repr()
    155 prompt_input = TopicExtractionInput(user_input=user_input)

AssertionError: Sample reference_topics must be a list

Expected behavior
This should just work. The reference_topics should be part of TopicAdherenceScore._required_columns property and not striped off from the MultiTurnSample instance

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

luqmansen added the bug Something isn't working label Oct 23, 2024

dosubot bot added the module-metrics this is part of metrics module label Oct 23, 2024

luqmansen changed the title ~~Contradictory Arguments with Validation in~~ Contradictory Arguments with Validation in TopicAdherenceScore Oct 23, 2024

luqmansen changed the title ~~Contradictory Arguments with Validation in TopicAdherenceScore~~ [bug] Contradictory Arguments with Validation in TopicAdherenceScore Oct 23, 2024

luqmansen mentioned this issue Oct 23, 2024

fix: add reference_topics as default required columns in TopicAdherenceScore #1564 #1566

Merged

shahules786 closed this as completed in #1566 Oct 25, 2024

shahules786 closed this as completed in 400f243 Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] Contradictory Arguments with Validation in `TopicAdherenceScore` #1564

[bug] Contradictory Arguments with Validation in `TopicAdherenceScore` #1564

luqmansen commented Oct 23, 2024 •

edited

Loading

[bug] Contradictory Arguments with Validation in TopicAdherenceScore #1564

[bug] Contradictory Arguments with Validation in TopicAdherenceScore #1564

Comments

luqmansen commented Oct 23, 2024 • edited Loading

[bug] Contradictory Arguments with Validation in `TopicAdherenceScore` #1564

[bug] Contradictory Arguments with Validation in `TopicAdherenceScore` #1564

luqmansen commented Oct 23, 2024 •

edited

Loading