-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix IAE on cross_fields query introduced in 7.0.1 #41938
Conversation
If the max doc in the index is greater than the minimum total term frequency among the requested fields we need to adjust max doc to be equal to the min ttf. This was removed by mistake when fixing elastic#41125. Closes elastic#41934
Pinging @elastic/es-search |
@elasticmachine run elasticsearch-ci/1 |
@jimczi, thanks for fixing that. This pull request is labeled with v7.2.0. Does it mean this fix will come with v7.2.0 only? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can verify that this fixes the TTF>=DF assertion failure but I'm not certain whether lowering DF or raising TTF would be the right fix scoring-wise.
The main motivation for BlendedTermQuery was to fix the problem of multi-field-search favouring the wrong context for a term (default Lucene prefers lastName:mark over firstName:mark). To fix that we had to pick the most common field for a term and use that as the choice of DF.
So we want to take max(DF) of the fields containing a term to make it more boring.
Perhaps the fix should be to raise TTF rather than lower DF given our general motivation is to raise DF.
I confess I don't know how TTF is generally used in scoring - I know higher TFs are preferred because it indicates a single doc is repeatedly using a term but not sure what value is attached to TTF.
My assumption is fixing the assertion break by raising TTF is preferable to lowering DF given we want to make terms seem more boring, not rarer and therefore more interesting.
I'd like to merge this fix first since this is a bug introduced in a patch version and we can discuss better ways to handle this in a follow up ? We also want to switch to the new BM25FQuery in Lucene which doesn't alter the ttf or df artificially. |
That's the plan, yes. It's too late for 7.1.0 but the fix should be in the next release after that. |
@elasticmachine run elasticsearch-ci/1 |
@jimczi Why is this not being put into the very next patch release? The very point of a patch release is to fix bugs. Including this severe bug that has made search completely unusable for us. This bug was introduced in a patch release and has no good reason not to be fixed in a patch release. What was supposed to be a painless upgrade to 7.0.1 has literally broken search for us and we are supposed to wait for 2 more minor releases? Forget 7.2.0 and even 7.1.0, this needs to be a priority to make it into 7.0.2! |
If the max doc in the index is greater than the minimum total term frequency among the requested fields we need to adjust max doc to be equal to the min ttf. This was removed by mistake when fixing elastic#41125. Closes elastic#41934
In #89016 we adjusted the `cross_fields` scoring formula to prevent negative scores. This fix accidentally dropped another important fix that was added in #41938. Specifically, we need to make sure to take the minimum between the document frequency (`actualDf`) and the minimum total term frequency (`minTTF`). Otherwise, we can produce invalid term statistics where the total term frequency is less than the document frequency. Fixes #90275
In #89016 we adjusted the `cross_fields` scoring formula to prevent negative scores. This fix accidentally dropped another important fix that was added in #41938. Specifically, we need to make sure to take the minimum between the document frequency (`actualDf`) and the minimum total term frequency (`minTTF`). Otherwise, we can produce invalid term statistics where the total term frequency is less than the document frequency. Fixes #90275
In elastic#89016 we adjusted the `cross_fields` scoring formula to prevent negative scores. This fix accidentally dropped another important fix that was added in elastic#41938. Specifically, we need to make sure to take the minimum between the document frequency (`actualDf`) and the minimum total term frequency (`minTTF`). Otherwise, we can produce invalid term statistics where the total term frequency is less than the document frequency. Fixes elastic#90275
In #89016 we adjusted the `cross_fields` scoring formula to prevent negative scores. This fix accidentally dropped another important fix that was added in #41938. Specifically, we need to make sure to take the minimum between the document frequency (`actualDf`) and the minimum total term frequency (`minTTF`). Otherwise, we can produce invalid term statistics where the total term frequency is less than the document frequency. Fixes #90275
If the max doc in the index is greater than the minimum total term frequency
among the requested fields we need to adjust max doc to be equal to the min ttf.
This was removed by mistake when fixing #41125.
Closes #41934