Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move retriever probability calculations to document_store #389

Merged
merged 3 commits into from
Sep 17, 2020

Conversation

tanaysoni
Copy link
Contributor

This PR moves the retrieval probability(pseudo probability by scaling query scores) calculations to the respective document stores.

It simplifies the similarity matching method(get_answers_via_similar_questions()) in the Finder. Additionally, now the retrieved documents have an explicit probability field that could be useful in the future.

@tanaysoni tanaysoni requested a review from tholor September 17, 2020 12:25
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.
Only comment: Do we want to keep the naming like this (probability and query_score) or make it more consistent: e.g. probability and score

@tanaysoni
Copy link
Contributor Author

score sounds more appropriate. I'll change it.

@tanaysoni tanaysoni merged commit 06243db into master Sep 17, 2020
@tanaysoni tanaysoni deleted the refactor-probability branch September 17, 2020 14:25
@guillim
Copy link
Contributor

guillim commented Oct 26, 2020

Hi, I read your PR and I liked the renaming for more simplicity, however I end up a bit confused.

In haystack/schema.py I can read :

        :param score: Retriever's query score for a retrieved document
        :param probability: a psuedo probability by scaling score in the range 0 to 1

So score is the retriever score.
And probability is the retriever score, scaled from 0 to 1

So... where is the Reader score ? I thought it was the probability property but not sure any longer after reading this. Also, if no score for the reader is returned, then it means the results are sorted by their retriever score only. Do I get it wrong ?

@guillim
Copy link
Contributor

guillim commented Oct 26, 2020

Ok my bad : we are talking Document schema here : the Reader results won't appear here. I think I was confused because the exact same property name is used in the Reader like here : haystack/reader/transformers.py in the predict() function

"answer": pred["answer"],
"context": doc.text[context_start:context_end],
"offset_start": pred["start"],
"offset_end": pred["end"],
"probability": pred["score"],
"score": None,
"document_id": doc.id,
"meta": doc.meta

@guillim
Copy link
Contributor

guillim commented Oct 26, 2020

But it means, at the moment, there are no information about the retriever score passed to the answers yet. Is that correct @tholor ?

@tholor
Copy link
Member

tholor commented Oct 26, 2020

@guillim Yes, that's correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants