Move retriever probability calculations to document_store #389

tanaysoni · 2020-09-17T12:10:40Z

This PR moves the retrieval probability(pseudo probability by scaling query scores) calculations to the respective document stores.

It simplifies the similarity matching method(get_answers_via_similar_questions()) in the Finder. Additionally, now the retrieved documents have an explicit probability field that could be useful in the future.

tholor

Looking good.
Only comment: Do we want to keep the naming like this (probability and query_score) or make it more consistent: e.g. probability and score

tanaysoni · 2020-09-17T12:50:53Z

score sounds more appropriate. I'll change it.

guillim · 2020-10-26T10:03:49Z

Hi, I read your PR and I liked the renaming for more simplicity, however I end up a bit confused.

In haystack/schema.py I can read :

        :param score: Retriever's query score for a retrieved document
        :param probability: a psuedo probability by scaling score in the range 0 to 1

So score is the retriever score.
And probability is the retriever score, scaled from 0 to 1

So... where is the Reader score ? I thought it was the probability property but not sure any longer after reading this. Also, if no score for the reader is returned, then it means the results are sorted by their retriever score only. Do I get it wrong ?

guillim · 2020-10-26T10:22:53Z

Ok my bad : we are talking Document schema here : the Reader results won't appear here. I think I was confused because the exact same property name is used in the Reader like here : haystack/reader/transformers.py in the predict() function

"answer": pred["answer"],
"context": doc.text[context_start:context_end],
"offset_start": pred["start"],
"offset_end": pred["end"],
"probability": pred["score"],
"score": None,
"document_id": doc.id,
"meta": doc.meta

guillim · 2020-10-26T10:26:15Z

But it means, at the moment, there are no information about the retriever score passed to the answers yet. Is that correct @tholor ?

tholor · 2020-10-26T10:54:04Z

@guillim Yes, that's correct.

Move probability calculations to document stores

58969ad

tanaysoni requested a review from tholor September 17, 2020 12:25

tholor reviewed Sep 17, 2020

View reviewed changes

tanaysoni added 2 commits September 17, 2020 15:40

Rename query_score to score

1c406c2

Rename query_score to score

de5c0db

tholor mentioned this pull request Sep 17, 2020

ValidationError from haystack API when attempting curl request to doc-qa #387

Closed

tholor approved these changes Sep 17, 2020

View reviewed changes

tanaysoni merged commit 06243db into master Sep 17, 2020

tanaysoni deleted the refactor-probability branch September 17, 2020 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move retriever probability calculations to document_store #389

Move retriever probability calculations to document_store #389

tanaysoni commented Sep 17, 2020

tholor left a comment

tanaysoni commented Sep 17, 2020

guillim commented Oct 26, 2020

guillim commented Oct 26, 2020

guillim commented Oct 26, 2020

tholor commented Oct 26, 2020

Move retriever probability calculations to document_store #389

Move retriever probability calculations to document_store #389

Conversation

tanaysoni commented Sep 17, 2020

tholor left a comment

Choose a reason for hiding this comment

tanaysoni commented Sep 17, 2020

guillim commented Oct 26, 2020

guillim commented Oct 26, 2020

guillim commented Oct 26, 2020

tholor commented Oct 26, 2020