Ability to set percentage influence of each function in function score query #15670

Vineeth-Mohan · 2015-12-26T06:57:16Z

The functions score query gives a good facility to implement various aspects of the score , but then its not exactly giving control over the influence of each function.
For eg: , for the function below -

{
  "query": {
    "function_score": {
      "functions": [
        {
          "decay": {
            "gauss": {
              "date": {
                "origin": "2013-09-17",
                "scale": "10d",
                "offset": "5d",
                "decay": 0.5
              }
            }
          }
        },
        {
          "field_value_factor": {
            "field": "popularity",
            "factor": 1.2,
            "modifier": "sqrt",
            "missing": 1
          }
        },
        {
          "random_score": {}
        },
        {
          "script_score": {
            "script": {
              "lang": "lang",
              "params": {
                "param1": 2,
                "param2": 3
              },
              "inline": "_score * doc['rating'].value / pow(param1, param2)"
            }
          }
        }
      ]
    }
  }
}

There are 4 functions and they dictate the end score. Here , either of the function like the script_score function can eat up all the influence of the score. That is the value of the script_score might be in range of 1000 to 2000 and value of the decay would be between 0 and 1. Hence the influence of the decay function is not exactly passed on to the final score , rather its the script_score that eats up all the influence , rest of the functions might have little or no influence on the final score.

To fix this , it might be useful to have a influenceScore factor per function which tells what percentage of the end score , this function should influence.
For eg: , the above query can be rewriten as

{
  "query": {
    "function_score": {
      "functions": [
        {
          "influenceScore": "40%",
          "decay": {
            "gauss": {
              "date": {
                "origin": "2013-09-17",
                "scale": "10d",
                "offset": "5d",
                "decay": 0.5
              }
            }
          }
        },
        {
          "influenceScore": "30%",
          "field_value_factor": {
            "field": "popularity",
            "factor": 1.2,
            "modifier": "sqrt",
            "missing": 1
          }
        },
        {
          "influenceScore": "10%",
          "random_score": {}
        },
        {
          "influenceScore": "20%",
          "script_score": {
            "script": {
              "lang": "lang",
              "params": {
                "param1": 2,
                "param2": 3
              },
              "inline": "_score * doc['rating'].value / pow(param1, param2)"
            }
          }
        }
      ]
    }
  }
}

Here , we will have a influenceScore per function which dictates the influence of each function. This will help us in further fine tuning the score.

The text was updated successfully, but these errors were encountered:

s1monw · 2015-12-28T09:09:54Z

can't you just use the weight attribute of a function? instead of influenceScore : 20% you do weight: 0.2

s1monw · 2016-01-07T16:46:37Z

@Vineeth-Mohan ping

Vineeth-Mohan · 2016-01-08T02:51:39Z

Hello @s1monw ,

Let me walk through the motivation here.
Lets say , I am running the following query

{
  "explain": true,
  "query": {
    "function_score": {
      "functions": [
        {
          "field_value_factor": {
            "field": "dateOfJoining",
            "modifier": "sqrt",
            "missing": 1
          }
        },
        {
          "random_score": {}
        }
      ],
      "score_mode": "sum"
    }
  }
}

With this , I am seeing the following results -

{
  "_explanation": {
    "value": 1113172.4,
    "description": "function score, product of:",
    "details": [
      {
        "value": 1,
        "description": "ConstantScore(*:*), product of:",
        "details": [
          {
            "value": 1,
            "description": "boost"
          },
          {
            "value": 1,
            "description": "queryNorm"
          }
        ]
      },
      {
        "value": 1113172.4,
        "description": "Math.min of",
        "details": [
          {
            "value": 1113172.4,
            "description": "function score, score mode [sum]",
            "details": [
              {
                "value": 1113172.2,
                "description": "function score, product of:",
                "details": [
                  {
                    "value": 1,
                    "description": "match filter: *:*"
                  },
                  {
                    "value": 1113172.2,
                    "description": "field value function: sqrt(doc['dateOfJoining'].value?:1.0 * factor=1.0)"
                  }
                ]
              },
              {
                "value": 0.17271471,
                "description": "function score, product of:",
                "details": [
                  {
                    "value": 1,
                    "description": "match filter: *:*"
                  },
                  {
                    "value": 0.17271471,
                    "description": "random score function (seed: 519896482)"
                  }
                ]
              }
            ]
          },
          {
            "value": 3.4028235e+38,
            "description": "maxBoost"
          }
        ]
      },
      {
        "value": 1,
        "description": "queryBoost"
      }
    ]
  }
}

As you can see the score by field_value_factor is always shadowing the score given by random_score , as in random_score has no relevance here.

My motivation for this issue came from this problem.
One solution would be to use the weight to normalize the values , and that is how its currently done.
But then looking into the range of values for each function and deciding the weight score for all the functions and finding them manually seems like a hard case. And these weights that are computed manually might not be applicable across all documents.

The percentage suggestion was based on this , but I am finding it difficult to pen the maths behind the same. Only solution i found was to find the range of each score given by each function across all document and use that for percentage influence. But as scoring is per document , that wont be feasible.

Let me know your thoughts on the subject.

s1monw · 2016-01-08T08:19:59Z

@Vineeth-Mohan I can see what you are saying and I admit it can be challenging. I personally don't see a good way to apply a general way of normalization here. I see the function score feature as a toolset of primitives that lets / forces the user to ensure that each element of the equation has it's relevant weight etc. I wonder if other ie. @brwe has some ideas?

brwe · 2016-01-08T14:07:35Z

It seems to me this is a case of "learning to rank". To find proper weights you would need to know what the expected ordering of result for different queries would be and the tune the weights accordingly. Without that the only thing you can do now is guess.
We currently have no way to scale functions either so they are comparable. This is something you will have to do in advance. Just in case you don't know aggregations help for that, see example below. Other than that we currently have no support to tune the weights automatically.

{
  "query": {
    "function_score": {
      "functions": [
        {
          "random_score": {},
          "weight": 1000
        }
      ]
    }
  },
  "aggs": {
    "score_agg": {
      "histogram": {
        "script": "_score",
        "interval": 50
      }
    },
    "score_stats": {
      "extended_stats": {
        "script": "_score"
      }
    }
  }
}

gkop · 2016-02-09T19:44:07Z

@brwe another benefit of what's proposed here if I understand correctly is one could use score_mode avg which could be weighted by influenceScore to generate scores nicely distributed on a range. This can be accounted for now in the client by passing influenceScore as a param to our script (which multiplies it by the nicely distributed intermediate score), and keeping a running sum of the influence scores, but it would be quite amazing if the server took care of it for us instead.

In fact, on reading the docs at https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html , I initially interpreted that we could pass weight in as an option to any kind of function_score score function to obtain this behavior, that just made sense to me. Alas I misunderstood.

clintongormley · 2016-02-29T20:58:03Z

The only other thing I could suggest is to apply a min/max score to each function, eg you could force gauss to be between 0 and 2. With that, the weights would be easier to adjust.

mayya-sharipova · 2018-03-21T20:15:46Z

Closing this in favour of #27588, where one of the desired features could be to normalize scores

s1monw added the feedback_needed label Jan 7, 2016

clintongormley added the :Query DSL label Jan 10, 2016

clintongormley added discuss and removed feedback_needed labels Feb 29, 2016

$@polyfractal$ polyfractal mentioned this issue Nov 29, 2017

Replacing the function_score with discrete queries #27588

Closed

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018

mayya-sharipova closed this as completed Mar 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to set percentage influence of each function in function score query #15670

Ability to set percentage influence of each function in function score query #15670

Vineeth-Mohan commented Dec 26, 2015

s1monw commented Dec 28, 2015

s1monw commented Jan 7, 2016

Vineeth-Mohan commented Jan 8, 2016

s1monw commented Jan 8, 2016

brwe commented Jan 8, 2016

gkop commented Feb 9, 2016

clintongormley commented Feb 29, 2016

mayya-sharipova commented Mar 21, 2018 •

edited

Loading

Ability to set percentage influence of each function in function score query #15670

Ability to set percentage influence of each function in function score query #15670

Comments

Vineeth-Mohan commented Dec 26, 2015

s1monw commented Dec 28, 2015

s1monw commented Jan 7, 2016

Vineeth-Mohan commented Jan 8, 2016

s1monw commented Jan 8, 2016

brwe commented Jan 8, 2016

gkop commented Feb 9, 2016

clintongormley commented Feb 29, 2016

mayya-sharipova commented Mar 21, 2018 • edited Loading

mayya-sharipova commented Mar 21, 2018 •

edited

Loading