Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Contradictory Arguments with Validation in TopicAdherenceScore #1564

Closed
1 task done
luqmansen opened this issue Oct 23, 2024 · 0 comments · Fixed by #1566
Closed
1 task done

[bug] Contradictory Arguments with Validation in TopicAdherenceScore #1564

luqmansen opened this issue Oct 23, 2024 · 0 comments · Fixed by #1566
Labels
bug Something isn't working module-metrics this is part of metrics module

Comments

@luqmansen
Copy link
Contributor

luqmansen commented Oct 23, 2024

  • I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug

I'm trying the exact example from your documentation for topic adherence evaluation, except I changed the score function from async to sync. In the MultiTurnMetric.multi_turn_score, the function will strip off the reference_topics from the MultiTurnSample instance argument I have given (L335).

def multi_turn_score(
self,
sample: MultiTurnSample,
callbacks: Callbacks = None,
) -> float:
"""
Score a multi-turn conversation sample synchronously.
May raise ImportError if nest_asyncio is not installed in Jupyter-like environments.
"""
callbacks = callbacks or []
sample = self._only_required_columns_multi_turn(sample)

This will makes the type validation in TopicAdherenceScore._multi_turn_ascore failed (l150-152) because the _required_column for this class in only user_input(L137-138)

class TopicAdherenceScore(MetricWithLLM, MultiTurnMetric):
name: str = "topic_adherence" # type: ignore
_required_columns: t.Dict[MetricType, t.Set[str]] = field(
default_factory=lambda: {MetricType.MULTI_TURN: {"user_input"}}
)
mode: t.Literal["precision", "recall", "f1"] = "f1"
topic_extraction_prompt: PydanticPrompt = TopicExtractionPrompt()
topic_classification_prompt: PydanticPrompt = TopicClassificationPrompt()
topic_refused_prompt: PydanticPrompt = TopicRefusedPrompt()
async def _multi_turn_ascore(
self, sample: MultiTurnSample, callbacks: Callbacks
) -> float:
assert self.llm is not None, "LLM must be set"
assert isinstance(sample.user_input, list), "Sample user_input must be a list"
assert isinstance(
sample.reference_topics, list
), "Sample reference_topics must be a list"

Ragas version: 0.2.2
Python version: 3.9

Code to Reproduce
Share code to reproduce the issue

from ragas.dataset_schema import  SingleTurnSample, MultiTurnSample, EvaluationDataset
from ragas.messages import HumanMessage,AIMessage,ToolMessage,ToolCall
from ragas.metrics import TopicAdherenceScore


sample_input_4 = [
    HumanMessage(content="Can you provide me with details about Einstein's theory of relativity?"),
    AIMessage(content="Sure, let me retrieve the relevant information for you.", tool_calls=[
        ToolCall(name="document_search", args={"query": "Einstein's theory of relativity"})
    ]),
    ToolMessage(content="Found relevant documents: 1. Relativity: The Special and the General Theory, 2. General Theory of Relativity by A. Einstein."),
    AIMessage(content="I found some documents on Einstein's theory of relativity. Which one would you like to know more about: 'Relativity: The Special and the General Theory' or 'General Theory of Relativity by A. Einstein'?"),
    HumanMessage(content="Tell me about the 'General Theory of Relativity'."),
    AIMessage(content="Got it! Let me fetch more details from 'General Theory of Relativity by A. Einstein'.", tool_calls=[
        ToolCall(name="document_retrieve", args={"document": "General Theory of Relativity by A. Einstein"})
    ]),
    ToolMessage(content="The document discusses how gravity affects the fabric of spacetime, describing the relationship between mass and spacetime curvature."),
    AIMessage(content="The 'General Theory of Relativity' explains how gravity affects the fabric of spacetime and the relationship between mass and spacetime curvature. Would you like more details or a specific explanation?"),
    HumanMessage(content="No, that's perfect. By the way, do you know any good recipes for a chocolate cake?"),
    AIMessage(content="Sure! Let me find a simple and delicious recipe for a chocolate cake.", tool_calls=[
        ToolCall(name="recipe_search", args={"query": "chocolate cake recipe"})
    ]),
    ToolMessage(content="Here’s a popular recipe for a chocolate cake: Ingredients include flour, sugar, cocoa powder, eggs, milk, and butter. Instructions: Mix dry ingredients, add wet ingredients, and bake at 350°F for 30-35 minutes."),
    AIMessage(content="I found a great recipe for chocolate cake! Would you like the full details, or is that summary enough?")
]


sample = MultiTurnSample(user_input=sample_input_4, reference_topics=list(["science"]))
scorer = TopicAdherenceScore(mode="precision", llm=azure_llm) # use your own LLM
scorer.multi_turn_score(sample)

Error trace

AssertionError                            Traceback (most recent call last)
Cell In[24], line 32
     30 sample = MultiTurnSample(user_input=sample_input_4, reference_topics=sample_reference)
     31 scorer = TopicAdherenceScore(mode="precision", llm=azure_llm)
---> 32 scorer.multi_turn_score(sample)

File ~/.local/share/virtualenvs/-fqIfThIg/lib/python3.9/site-packages/ragas/metrics/base.py:359, in MultiTurnMetric.multi_turn_score(self, sample, callbacks)
    357     if not group_cm.ended:
    358         rm.on_chain_error(e)
--> 359     raise e
    360 else:
    361     if not group_cm.ended:

File ~/.local/share/virtualenvs/-fqIfThIg/lib/python3.9/site-packages/ragas/metrics/base.py:353, in MultiTurnMetric.multi_turn_score(self, sample, callbacks)
    349             raise ImportError(
    350                 "It seems like your running this in a jupyter-like environment. Please install nest_asyncio with `pip install nest_asyncio` to make it work."
    351             )
    352     loop = asyncio.get_event_loop()
--> 353     score = loop.run_until_complete(
    354         self._multi_turn_ascore(sample=sample, callbacks=group_cm)
    355     )
    356 except Exception as e:
    357     if not group_cm.ended:

File ~/.local/share/virtualenvs/-fqIfThIg/lib/python3.9/site-packages/nest_asyncio.py:98, in _patch_loop.<locals>.run_until_complete(self, future)
     95 if not f.done():
     96     raise RuntimeError(
     97         'Event loop stopped before Future completed.')
---> 98 return f.result()

File ~/.pyenv/versions/3.9.16/lib/python3.9/asyncio/futures.py:201, in Future.result(self)
    199 self.__log_traceback = False
    200 if self._exception is not None:
--> 201     raise self._exception
    202 return self._result

File ~/.pyenv/versions/3.9.16/lib/python3.9/asyncio/tasks.py:256, in Task.__step(***failed resolving arguments***)
    252 try:
    253     if exc is None:
    254         # We use the `send` method directly, because coroutines
    255         # don't have `__iter__` and `__next__` methods.
--> 256         result = coro.send(None)
    257     else:
    258         result = coro.throw(exc)

File ~/.local/share/virtualenvs/-fqIfThIg/lib/python3.9/site-packages/ragas/metrics/_topic_adherence.py:150, in TopicAdherenceScore._multi_turn_ascore(self, sample, callbacks)
    148 assert self.llm is not None, "LLM must be set"
    149 assert isinstance(sample.user_input, list), "Sample user_input must be a list"
--> 150 assert isinstance(
    151     sample.reference_topics, list
    152 ), "Sample reference_topics must be a list"
    153 user_input = sample.pretty_repr()
    155 prompt_input = TopicExtractionInput(user_input=user_input)

AssertionError: Sample reference_topics must be a list

Expected behavior
This should just work. The reference_topics should be part of TopicAdherenceScore._required_columns property and not striped off from the MultiTurnSample instance

Additional context
Add any other context about the problem here.

@luqmansen luqmansen added the bug Something isn't working label Oct 23, 2024
@dosubot dosubot bot added the module-metrics this is part of metrics module label Oct 23, 2024
@luqmansen luqmansen changed the title Contradictory Arguments with Validation in Contradictory Arguments with Validation in TopicAdherenceScore Oct 23, 2024
@luqmansen luqmansen changed the title Contradictory Arguments with Validation in TopicAdherenceScore [bug] Contradictory Arguments with Validation in TopicAdherenceScore Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module-metrics this is part of metrics module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant