Move back to BM25 similarity #36431

javanna · 2018-12-10T13:18:38Z

With the last lucene upgrade we have temporarily adopted the LegacyBM25Similarity which exposes the same scores as BM25Similarity before the k1+1 factor was removed from the numerator of the scoring formula.

This PR changes the default Elasticsearch similarity back to the Bm25Similarity and updates the scores that have changed due to such change in our docs and tests.

With the recent lucene upgrade we have temporarily adopted the LegacyBM25Similarity which exposes the same scores as BM25Similarity before the k1+1 factor was removed from the numerator of the scoring formula. This commit moves the default Elasticsearch similarity back to the Bm25Similarity and updates the scores that have changed in our docs and tests

elasticmachine · 2018-12-10T13:18:39Z

Pinging @elastic/es-search

romseygeek

One question around the SQL tests, looks good otherwise.

romseygeek · 2018-12-10T14:01:11Z

server/src/test/java/org/elasticsearch/search/nested/SimpleNestedIT.java

@@ -326,7 +326,7 @@ public void testExplain() throws Exception {
        assertThat(searchResponse.getHits().getTotalHits().value, equalTo(1L));
        Explanation explanation = searchResponse.getHits().getHits()[0].getExplanation();
        assertThat(explanation.getValue(), equalTo(searchResponse.getHits().getHits()[0].getScore()));
-        assertThat(explanation.toString(), startsWith("0.36464313 = Score based on 2 child docs in range from 0 to 1"));
+        assertThat(explanation.toString(), startsWith(explanation.getValue() + " = Score based on 2 child docs in range from 0 to 1"));


romseygeek · 2018-12-10T14:02:24Z

x-pack/plugin/sql/qa/src/main/resources/docs.csv-spec

-2.288635       |Frank Herbert  |Dune               |604            |1965-06-01T00:00:00Z
-1.8893257      |Frank Herbert  |Dune Messiah       |331            |1969-10-15T00:00:00Z
+1.0402887       |Frank Herbert  |Dune               |604            |1965-06-01T00:00:00Z
+0.8587844      |Frank Herbert  |Dune Messiah       |331            |1969-10-15T00:00:00Z
 1.6086555      |Frank Herbert  |Children of Dune   |408            |1976-04-21T00:00:00Z


This doesn't look right? All the scores should change, and currently things aren't in score order

ok I will double check if I fixed all the issues in tests, and if so I will check why this is the case here.

ok this is really weird. Two documents have updated score, while the other two in this test (and other tests returning the score) seem to have the same score as before, which with the updated bm25 similarity should not be the case. Tests are green which is what concerns me the most. @costin @nik9000 would you know why?

Let me add that I have just indexed the same documents used for these tests and verified manually that all of the 4 documents matching this query get an updated score with the bm25 change. Seems like something weird happens with these tests, not sure what exactly.

A mixed cluster with nodes in 6x and 7 ?

possibly, yet I would expect these tests to run both in multi-node and single-node. which makes me expect that I should not get a green build with the current changes. Single-node should require to update all the scores. I may not be getting how these tests are run.

@javanna indeed, there is something wrong here. The tests pass and, looking at the logs it produces, it seems like it's not comparing the output from the csv-spec with the actual output of the query being executed... am looking into it.

@javanna the issue is here. An assertion with 1.0 delta which, in the case of SCORE(), is quite relevant and a deal breaker.

I see @astefan thanks for looking! Would you have the chance to fix this upstream?

Yep, I'll open a PR.

javanna · 2018-12-14T11:05:55Z

retest this please

romseygeek

LGTM, thanks @javanna!

This allows users to opt-out of the updated bm25 similarities and use the deprecated legacy bm25 similarity. This helps especially cross-cluster search cases across clusters on multiple versions, otherwise sorting by score would lead to very weird results depending on the version of the data node that scores each doc.

javanna · 2019-01-08T13:30:19Z

retest this please

javanna · 2019-01-08T17:04:45Z

run gradle build tests 1

elasticsearchmachine · 2022-07-27T14:59:51Z

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine · 2024-07-17T19:36:35Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

javanna added 7 commits December 10, 2018 11:29

fix tests

6d92c11

add note about bm25 changes

1dea27f

update scores in docs

53dd1f3

update scores in docs

af56872

fix bad merge

b224461

fix sql tests

f09f5dd

javanna added >breaking :Search/Search Search-related issues that do not fall into other categories v7.0.0 labels Dec 10, 2018

javanna requested a review from romseygeek December 10, 2018 13:18

romseygeek reviewed Dec 10, 2018

View reviewed changes

javanna added 2 commits December 14, 2018 10:27

Merge branch 'master' into enhancement/back_to_bm25

36c5827

fix remaining failures

90b97d8

javanna force-pushed the enhancement/back_to_bm25 branch from 6fbc166 to 90b97d8 Compare December 14, 2018 10:20

revert change sto release-notes.asciidoc

95658de

javanna requested a review from jpountz December 14, 2018 10:23

javanna requested a review from romseygeek December 14, 2018 11:39

Merge branch 'master' into enhancement/back_to_bm25

edd1c91

romseygeek approved these changes Dec 14, 2018

View reviewed changes

javanna added 2 commits December 17, 2018 16:33

update release notes - mention mixed clusters

cfafb9e

Merge branch 'master' into enhancement/back_to_bm25

3b41212

javanna added the WIP label Jan 4, 2019

javanna added 3 commits January 4, 2019 10:46

additional test

bc5845d

Merge branch 'master' into enhancement/back_to_bm25

1d182c5

mark-vieira added v8.2.0 and removed v8.1.0 labels Feb 2, 2022

salvatore-campagna added v8.3.0 and removed v8.2.0 labels Mar 30, 2022

craigtaverner added v8.4.0 and removed v8.3.0 labels May 25, 2022

elasticsearchmachine changed the base branch from master to main July 22, 2022 23:14

mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022

csoulios added v8.6.0 and removed v8.5.0 labels Sep 21, 2022

kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022

rjernst added v8.8.0 and removed v8.7.0 labels Feb 8, 2023

gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023

pugnascotia added v8.10.0 and removed v8.9.0 labels Jun 22, 2023

quux00 added v8.11.0 and removed v8.10.0 labels Aug 16, 2023

mattc58 added v8.12.0 and removed v8.11.0 labels Oct 4, 2023

javanna removed the v8.12.0 label Dec 6, 2023

javanna added :Search Foundations/Search Catch all for Search Foundations and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024

elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move back to BM25 similarity #36431

Move back to BM25 similarity #36431

javanna commented Dec 10, 2018

elasticmachine commented Dec 10, 2018

romseygeek left a comment

romseygeek Dec 10, 2018

romseygeek Dec 10, 2018

javanna Dec 10, 2018

javanna Dec 10, 2018

javanna Dec 10, 2018

jimczi Dec 10, 2018

javanna Dec 10, 2018 •

edited

Loading

astefan Dec 11, 2018

astefan Dec 11, 2018

javanna Dec 11, 2018

astefan Dec 11, 2018

javanna commented Dec 14, 2018

romseygeek left a comment

javanna commented Jan 8, 2019

javanna commented Jan 8, 2019

elasticsearchmachine commented Jul 27, 2022

elasticsearchmachine commented Jul 17, 2024

Move back to BM25 similarity #36431

Are you sure you want to change the base?

Move back to BM25 similarity #36431

Conversation

javanna commented Dec 10, 2018

elasticmachine commented Dec 10, 2018

romseygeek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javanna Dec 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javanna commented Dec 14, 2018

romseygeek left a comment

Choose a reason for hiding this comment

javanna commented Jan 8, 2019

javanna commented Jan 8, 2019

elasticsearchmachine commented Jul 27, 2022

elasticsearchmachine commented Jul 17, 2024

javanna Dec 10, 2018 •

edited

Loading