You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To throw a link at the wall, I read this medium article -- Top Evaluation Metrics for RAG Failures -- last week.
It could be relevant to this conversation. At least you will probably know where I got some fancy new (and hard to code) solution from:
Furthermore, I met an open-source company in San Jose called TruEra. They focus on evaluating LLMs along with RAG algorithms. I proposed to them that we should work together as a use case for our future (yet unconfirmed) Wikidata-VectorDB.
If we ask them to help us evaluate our RAG, we could start the collaboration earlier, which is more likely to keep the long-term collaboration viable.
Terms
Issue
I think it would be interesting to evaluate the performance of the pipeline at different stages.
For the last GB&C Silvan and I implemented something very simple, but conceptually similar for the askwikidata prototype:
https://github.com/rti/askwikidata/blob/main/eval.py
There are also frameworks such as Ragas that might help https://docs.ragas.io/en/latest/getstarted/evaluation.html#metrics
The text was updated successfully, but these errors were encountered: