Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to set the number of hits to track accurately #36357

Merged
merged 14 commits into from
Jan 4, 2019
Merged
2 changes: 2 additions & 0 deletions docs/reference/search/request-body.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,8 @@ include::request/from-size.asciidoc[]

include::request/sort.asciidoc[]

include::request/track-total-hits.asciidoc[]

include::request/source-filtering.asciidoc[]

include::request/stored-fields.asciidoc[]
Expand Down
176 changes: 176 additions & 0 deletions docs/reference/search/request/track-total-hits.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
[[search-request-track-total-hits]]
=== Track total hits

Generally the total hit count can't be computed accurately without visiting all
matches, which is costly for queries that match lots of documents. The
`track_total_hits` parameter allows you to control how the total number of hits
should be tracked. When set to `true` the search response will always track the
number of hits that match the query accurately (e.g. `total.relation` will always
be equal to `"eq"` when `track_total_hits is set to true).

[source,js]
--------------------------------------------------
GET twitter/_search
{
"track_total_hits": true,
"query": {
"match" : {
"message" : "Elasticsearch"
}
}
}
--------------------------------------------------
// TEST[setup:twitter]
// CONSOLE

\... returns:

[source,js]
--------------------------------------------------
{
"_shards": ...
"timed_out": false,
"took": 100,
"hits": {
"max_score": 1.0,
"total" : {
"value": 2048, <1>
"relation": "eq" <2>
},
"hits": ...
}
}
--------------------------------------------------
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
// TESTRESPONSE[s/"took": 100/"took": $body.took/]
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
// TESTRESPONSE[s/"value": 2048/"value": $body.hits.total.value/]
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]

<1> The total number of hits that match the query.
<2> The count is accurate (e.g. `"eq"` means equals).

If you don't need to track the total number of hits you can improve query times
by setting this option to `false`. In such case the search can efficiently skip
non-competitive hits because it doesn't need to count all matches:

[source,js]
--------------------------------------------------
GET twitter/_search
{
"track_total_hits": false,
"query": {
"match" : {
"message" : "Elasticsearch"
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

\... returns:

[source,js]
--------------------------------------------------
{
"_shards": ...
"timed_out": false,
"took": 10,
"hits" : { <1>
"max_score": 1.0,
"hits": ...
}
}
--------------------------------------------------
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
// TESTRESPONSE[s/"took": 10/"took": $body.took/]
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]

<1> The total number of hits is unknown.

Given that it is often enough to have a lower bound of the number of hits,
such as "there are at least 1000 hits", it is also possible to set
`track_total_hits` as an integer that represents the number of hits to count
accurately. The search can efficiently skip non-competitive document as soon
as collecting at least $`track_total_hits` documents. This is a good trade
off to speed up searches if you don't need the accurate number of hits after
a certain threshold.


For instance the following query will track the total hit count that match
the query accurately up to 100 documents:

[source,js]
--------------------------------------------------
GET twitter/_search
{
"track_total_hits": 100,
"query": {
"match" : {
"message" : "Elasticsearch"
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

The `hits.total.relation` in the response will indicate if the
value returned in `hits.total.value` is accurate (`eq`) or a lower
bound of the total (`gte`).

For instance the following response:

[source,js]
--------------------------------------------------
{
"_shards": ...
"timed_out": false,
"took": 30,
"hits" : {
"max_score": 1.0,
"total" : {
"value": 42, <1>
"relation": "eq" <2>
},
"hits": ...
}
}
--------------------------------------------------
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
// TESTRESPONSE[s/"took": 30/"took": $body.took/]
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
// TESTRESPONSE[s/"value": 42/"value": $body.hits.total.value/]
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]

<1> 42 documents match the query
<2> and the count is accurate (`"eq"`)

\... indicates that the number of hits returned in the `total`
is accurate.

If the total number of his that match the query is greater than the
value set in `track_total_hits`, the total hits in the response
will indicate that the returned value is a lower bound:

[source,js]
--------------------------------------------------
{
"_shards": ...
"hits" : {
"max_score": 1.0,
"total" : {
"value": 100, <1>
"relation": "gte" <2>
},
"hits": ...
}
}
--------------------------------------------------
// TESTRESPONSE
// TEST[skip:response is already tested in the previous snippet]

<1> There are at least 100 documents that match the query
<2> This is a lower bound (`gte`).
8 changes: 5 additions & 3 deletions docs/reference/search/uri-request.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -101,10 +101,12 @@ is important).
|`track_scores` |When sorting, set to `true` in order to still track
scores and return them as part of each hit.

|`track_total_hits` |Set to `false` in order to disable the tracking
|`track_total_hits` |Defaults to true. Set to `false` in order to disable the tracking
of the total number of hits that match the query.
(see <<index-modules-index-sorting,_Index Sorting_>> for more details).
Defaults to true.
It also accepts an integer which in this case represents the number of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimczi , this is problematic in strongly-typed languages, since the track_total_hits parameter is defined as a boolean, and eg. the Go test suite fails with cannot use 4 (type int) as type bool in field value:

I understand the motivation here, but we should at least revisit the parameter definitions, and support something like "type" : ["boolean","number]", so the code generators can make decisions here.

hits to count accurately.
(See the <<search-request-track-total-hits, request body>> documentation
for more details).

|`timeout` |A search timeout, bounding the search request to be executed
within the specified time value and bail with the hits accumulated up to
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ public class RestMultiSearchTemplateAction extends BaseRestHandler {

static {
final Set<String> responseParams = new HashSet<>(
Arrays.asList(RestSearchAction.TYPED_KEYS_PARAM, RestSearchAction.TOTAL_HIT_AS_INT_PARAM)
Arrays.asList(RestSearchAction.TYPED_KEYS_PARAM, RestSearchAction.TOTAL_HITS_AS_INT_PARAM)
);
RESPONSE_PARAMS = Collections.unmodifiableSet(responseParams);
}
Expand Down Expand Up @@ -103,6 +103,7 @@ public static MultiSearchTemplateRequest parseRequest(RestRequest restRequest, b
} else {
throw new IllegalArgumentException("Malformed search template");
}
RestSearchAction.checkRestTotalHits(restRequest, searchRequest);
});
return multiRequest;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ public class RestSearchTemplateAction extends BaseRestHandler {
private static final Set<String> RESPONSE_PARAMS;

static {
final Set<String> responseParams = new HashSet<>(Arrays.asList(TYPED_KEYS_PARAM, RestSearchAction.TOTAL_HIT_AS_INT_PARAM));
final Set<String> responseParams = new HashSet<>(Arrays.asList(TYPED_KEYS_PARAM, RestSearchAction.TOTAL_HITS_AS_INT_PARAM));
RESPONSE_PARAMS = Collections.unmodifiableSet(responseParams);
}

Expand Down Expand Up @@ -77,6 +77,7 @@ public RestChannelConsumer prepareRequest(RestRequest request, NodeClient client
searchTemplateRequest = SearchTemplateRequest.fromXContent(parser);
}
searchTemplateRequest.setRequest(searchRequest);
RestSearchAction.checkRestTotalHits(request, searchRequest);

return channel -> client.execute(SearchTemplateAction.INSTANCE, searchTemplateRequest, new RestStatusToXContentListener<>(channel));
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
import java.nio.charset.StandardCharsets;

import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
import static org.elasticsearch.rest.action.search.RestSearchAction.TOTAL_HIT_AS_INT_PARAM;
import static org.elasticsearch.rest.action.search.RestSearchAction.TOTAL_HITS_AS_INT_PARAM;
import static org.hamcrest.Matchers.equalTo;

/**
Expand Down Expand Up @@ -158,7 +158,7 @@ private void bulk(String index, String valueSuffix, int count) throws IOExceptio

private void assertCount(String index, int count) throws IOException {
Request searchTestIndexRequest = new Request("POST", "/" + index + "/_search");
searchTestIndexRequest.addParameter(TOTAL_HIT_AS_INT_PARAM, "true");
searchTestIndexRequest.addParameter(TOTAL_HITS_AS_INT_PARAM, "true");
searchTestIndexRequest.addParameter("filter_path", "hits.total");
Response searchTestIndexResponse = client().performRequest(searchTestIndexRequest);
assertEquals("{\"hits\":{\"total\":" + count + "}}",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -115,11 +115,45 @@ setup:
- query:
match: {foo: foo}

- match: { responses.0.hits.total.value: 2 }
- match: { responses.0.hits.total.value: 2 }
- match: { responses.0.hits.total.relation: eq }
- match: { responses.1.hits.total.value: 1 }
- match: { responses.1.hits.total.value: 1 }
- match: { responses.1.hits.total.relation: eq }
- match: { responses.2.hits.total.value: 1 }
- match: { responses.2.hits.total.value: 1 }
- match: { responses.2.hits.total.relation: eq }

- do:
msearch:
body:
- index: index_*
- { query: { match: {foo: foo}}, track_total_hits: 1 }
- index: index_2
- query:
match_all: {}
- index: index_1
- query:
match: {foo: foo}

- match: { responses.0.hits.total.value: 1 }
- match: { responses.0.hits.total.relation: gte }
- match: { responses.1.hits.total.value: 1 }
- match: { responses.1.hits.total.relation: eq }
- match: { responses.2.hits.total.value: 1 }
- match: { responses.2.hits.total.relation: eq }

- do:
catch: /\[rest_total_hits_as_int\] cannot be used if the tracking of total hits is not accurate, got 10/
msearch:
rest_total_hits_as_int: true
body:
- index: index_*
- { query: { match_all: {}}, track_total_hits: 10}
- index: index_2
- query:
match_all: {}
- index: index_1
- query:
match: {foo: foo}



Loading