-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Update Ragas integration #1312
feat: Update Ragas integration #1312
Conversation
Hi @davidsbatista, The test cases are failing in Python 3.9 due to a regex mismatch, although they are passing in Python 3.10. Could you please advise on the best way to proceed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @sahusiddharth and thank you for opening this pull request! I left a few comments, just smaller change requests. Regarding the failing test, I have an idea what's happening here. The regex and the output don't match for Python 3.9 because type(expected_type) outputs typing.Dict, which is not the same as Dict
. In Python 3.10+, type(expected_type) outputs dict. typing.Dict[str, str]
is just making it even more complex. (see https://peps.python.org/pep-0585/)
Regex: "The 'rubrics' field expected 'one of Dict, NoneType', but got 'list'."
E Input: "Validation error occured while running RagasEvaluator Component:\nThe 'rubrics' field expected 'one of typing.Dict[str, str], NoneType', but got 'list'.\nHint: Provide {'score1': 'high_similarity'}"
This regex matching of the message is anyway error-prone and we should make the test more robust. Let's not try to match the exact error message with a regex.
evaluation_dataset = EvaluationDataset.from_list(evals_list) | ||
|
||
llm = ChatOpenAI(model="gpt-4o-mini") | ||
evaluator_llm = LangchainLLMWrapper(llm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could think about a HaystackLLMWrapper? What do you think? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use Langchain wrappers underneath to handle the evaluations, so if we move forward with Haystack, we might need to implement similar functionality ourselves or at least ensure the integration aligns with the existing workflow.
But we can definitely explore this further if you think it’s worth the effort!
integrations/ragas/src/haystack_integrations/components/evaluators/ragas/evaluator.py
Outdated
Show resolved
Hide resolved
integrations/ragas/src/haystack_integrations/components/evaluators/ragas/evaluator.py
Outdated
Show resolved
Hide resolved
integrations/ragas/src/haystack_integrations/components/evaluators/ragas/evaluator.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thank you for addressing the review comments so quickly. I will take care of releasing a new version of the integration now. Feel free to open a PR that edits the example code in this file to update this overview page: https://haystack.deepset.ai/integrations/ragas
And yes, let's explore a HaystackLLMWrapper!
I’m currently working on the code example and will raise a PR for it shortly. I’ll keep you updated once it’s ready! |
Hi @julian-risch, I’ve raised the PR for the code example. Could you please take a look when you have a moment and let me know your thoughts? |
Added
RagasEvaluator
component for Ragas version 0.2. support*This update introduces the new
RagasEvaluator
component, designed to be fully compatible with Ragas version 0.2.*. The component integrates with the latest changes and enhancements in the Ragas framework.