Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Rewrite term query docs for new format #41498

Merged
merged 5 commits into from
May 6, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
294 changes: 173 additions & 121 deletions docs/reference/query-dsl/term-query.asciidoc
Original file line number Diff line number Diff line change
@@ -1,168 +1,220 @@
[[query-dsl-term-query]]
=== Term Query

The `term` query finds documents that contain the *exact* term specified
in the inverted index. For instance:
Returns documents that contain an *exact* term in a provided field.

[source,js]
--------------------------------------------------
POST _search
{
"query": {
"term" : { "user" : "Kimchy" } <1>
}
}
--------------------------------------------------
// CONSOLE
<1> Finds documents which contain the exact term `Kimchy` in the inverted index
of the `user` field.
You can use the `term` query to find documents based on a precise value such as
a price, a product ID, or a username.

[WARNING]
====
Avoid using the `term` query for <<text, `text`>> fields.

By default, {es} changes the values of `text` fields as part of <<analysis,
analysis>>. This can make finding exact matches for `text` field values
difficult.

A `boost` parameter can be specified to give this `term` query a higher
relevance score than another query, for instance:
To search `text` field values, use the <<query-dsl-match-query,`match`>> query
instead.
====

[[term-query-ex-request]]
==== Example request

[source,js]
--------------------------------------------------
GET _search
----
GET /_search
{
"query": {
"bool": {
"should": [
{
"term": {
"status": {
"value": "urgent",
"boost": 2.0 <1>
"query": {
"term": {
"user": {
"value": "Kimchy",
"boost": 1.0
}
}
},
{
"term": {
"status": "normal" <2>
}
}
]
}
}
}
--------------------------------------------------
----
// CONSOLE

<1> The `urgent` query clause has a boost of `2.0`, meaning it is twice as important
as the query clause for `normal`.
<2> The `normal` clause has the default neutral boost of `1.0`.

A `term` query can also match against <<range, range data types>>.

.Why doesn't the `term` query match my document?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whilst I'm not wedded to having this explanation in this page of the documentation I think it would be useful to make sure we explain the differece between searching with analyzed or not analyzed queries somewhere since it gives some understanding into how search works and how to avoid some pitfalls. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback.

I think the warning provides enough information for users looking to get started quickly, but it doesn't explain why you should avoid using term-level queries for analyzed fields. The example in the aside is great for that.

I think that content would fit better in a concept-focused page like "Search structured data" or "Search full-text." Those don't exist yet, but I'll work on creating them and add it to this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great, thanks

**************************************************

String fields can be of type `text` (treated as full text, like the body of an
email), or `keyword` (treated as exact values, like an email address or a
zip code). Exact values (like numbers, dates, and keywords) have
the exact value specified in the field added to the inverted index in order
to make them searchable.

However, `text` fields are `analyzed`. This means that their
values are first passed through an <<analysis,analyzer>> to produce a list of
terms, which are then added to the inverted index.

There are many ways to analyze text: the default
<<analysis-standard-analyzer,`standard` analyzer>> drops most punctuation,
breaks up text into individual words, and lower cases them. For instance,
the `standard` analyzer would turn the string ``Quick Brown Fox!'' into the
terms [`quick`, `brown`, `fox`].

This analysis process makes it possible to search for individual words
within a big block of full text.

The `term` query looks for the *exact* term in the field's inverted index --
it doesn't know anything about the field's analyzer. This makes it useful for
looking up values in keyword fields, or in numeric or date
fields. When querying full text fields, use the
<<query-dsl-match-query,`match` query>> instead, which understands how the field
has been analyzed.


To demonstrate, try out the example below. First, create an index, specifying the field mappings, and index a document:
[[term-top-level-params]]
==== Top-level parameters for `term`
`<field>`::
Field you wish to search.

[[term-field-params]]
==== Parameters for `<field>`
`value`::
Term you wish to find in the provided `<field>`. To return a document, the term
must exactly match the field value, including whitespace and capitalization.

`boost`::
Floating point number used to decrease or increase the
<<query-filter-context, relevance scores>> of a query. Default is `1.0`.
Optional.
+
You can use the `boost` parameter to adjust relevance scores for searches
containing two or more queries.
+
Boost values are relative to the default value of `1.0`. A boost value between
`0` and `1.0` decreases the relevance score. A value greater than `1.0`
increases the relevance score.

[[term-query-notes]]
==== Notes

[[avoid-term-query-text-fields]]
===== Avoid using the `term` query for `text` fields
By default, {es} changes the values of `text` fields during analysis. For
example, the default <<analysis-standard-analyzer, standard analyzer>> changes
`text` field values as follows:

* Removes most punctuation
* Divides the remaining content into individual words, called
<<analysis-tokenizers, tokens>>
* Lowercases the tokens

To better search `text` fields, the `match` query also analyzes your provided
search term before performing a search. This means the `match` query can search
`text` fields for analyzed tokens rather than an exact term.

The `term` query does *not* analyze the search term. The `term` query only
searches for the *exact* term you provide. This means the `term` query may
return poor or no results when searching `text` fields.

To see the difference in search results, try the following example.

. Create an index with a `text` field called `full_text`.
+
--

[source,js]
--------------------------------------------------
----
PUT my_index
{
"mappings": {
"properties": {
"full_text": {
"type": "text" <1>
},
"exact_value": {
"type": "keyword" <2>
}
"mappings" : {
"properties" : {
"full_text" : { "type" : "text" }
}
}
}
}
----
// CONSOLE

--

. Index a document with a value of `Quick Brown Foxes!` in the `full_text`
field.
+
--

[source,js]
----
PUT my_index/_doc/1
{
"full_text": "Quick Foxes!", <3>
"exact_value": "Quick Foxes!" <4>
"full_text": "Quick Brown Foxes!"
}
--------------------------------------------------
----
// CONSOLE
// TEST[continued]

Because `full_text` is a `text` field, {es} changes `Quick Brown Foxes!` to
`[quick, brown, fox]` during analysis.

<1> The `full_text` field is of type `text` and will be analyzed.
<2> The `exact_value` field is of type `keyword` and will NOT be analyzed.
<3> The `full_text` inverted index will contain the terms: [`quick`, `foxes`].
<4> The `exact_value` inverted index will contain the exact term: [`Quick Foxes!`].
--

Now, compare the results for the `term` query and the `match` query:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity: are there follow up plans to move these more detailed examples somewhere else? While I like the idea of having succinct, standardized docs for each query, I find these kind of examples quite useful and better to understand than merely the minimal snippet that remains. Not saying this shouldn't go away, I'm just curious what the plan is for examples like this in this rewriting effort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @cbuescher.

For this particular set of examples, I think a concept-focused page like "Search structured data" or "Search full-text" would be a better fit. Those pages don't exist yet, but I'll work on creating them and add it to this PR.

For other queries, I think we can include additional sections for detailed examples below parameter documentation. A good example of this would be the missing query section for the exists query.

Let me know if you feel differently. I'm still new so there could be context I'm missing.

. Use the `term` query to search for `Quick Brown Foxes!` in the `full_text`
field. Include the `pretty` parameter so the response is more readable.
+
--

[source,js]
--------------------------------------------------
GET my_index/_search
----
GET my_index/_search?pretty
{
"query": {
"term": {
"exact_value": "Quick Foxes!" <1>
"full_text": "Quick Brown Foxes!"
}
}
}
----
// CONSOLE
// TEST[continued]

GET my_index/_search
{
"query": {
"term": {
"full_text": "Quick Foxes!" <2>
}
}
}
Because the `full_text` field no longer contains the *exact* term `Quick Brown
Foxes!`, the `term` query search returns no results.

GET my_index/_search
{
"query": {
"term": {
"full_text": "foxes" <3>
}
}
}
--

. Use the `match` query to search for `Quick Brown Foxes!` in the `full_text`
field.
+
--

////

GET my_index/_search
[source,js]
----
POST my_index/_refresh
----
// CONSOLE
// TEST[continued]

////

[source,js]
----
GET my_index/_search?pretty
{
"query": {
"match": {
"full_text": "Quick Foxes!" <4>
"full_text": "Quick Brown Foxes!"
}
}
}
--------------------------------------------------
----
// CONSOLE
// TEST[continued]

<1> This query matches because the `exact_value` field contains the exact
term `Quick Foxes!`.
<2> This query does not match, because the `full_text` field only contains
the terms `quick` and `foxes`. It does not contain the exact term
`Quick Foxes!`.
<3> A `term` query for the term `foxes` matches the `full_text` field.
<4> This `match` query on the `full_text` field first analyzes the query string,
then looks for documents containing `quick` or `foxes` or both.
**************************************************
Unlike the `term` query, the `match` query analyzes your provided search term,
`Quick Brown Foxes!`, before performing a search. The `match` query then returns
any documents containing the `quick`, `brown`, or `fox` tokens in the
`full_text` field.

Here's the response for the `match` query search containing the indexed document
in the results.

[source,js]
----
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.8630463,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.8630463,
"_source" : {
"full_text" : "Quick Brown Foxes!"
}
}
]
}
}
----
// TESTRESPONSE[s/"took" : 1/"took" : $body.took/]
--