Skip to content

Commit

Permalink
Reorganise scripting docs (#18132)
Browse files Browse the repository at this point in the history
* Reorganize scripting documentation

* Further changes to tidy up scripting docs

Closes #18116

* Add note about .lat/lon potentially returning null

* Added .value to expressions example

* Fixed two bad ASCIIDOC links
  • Loading branch information
clintongormley committed May 4, 2016
1 parent 5a0cfdd commit 34d90b0
Show file tree
Hide file tree
Showing 11 changed files with 1,108 additions and 777 deletions.
2 changes: 0 additions & 2 deletions docs/reference/modules.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -94,8 +94,6 @@ include::modules/network.asciidoc[]

include::modules/node.asciidoc[]

include::modules/painless.asciidoc[]

include::modules/plugins.asciidoc[]

include::modules/scripting.asciidoc[]
Expand Down
103 changes: 101 additions & 2 deletions docs/reference/modules/scripting.asciidoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,104 @@
include::scripting/scripting.asciidoc[]
[[modules-scripting]]
== Scripting

include::scripting/advanced-scripting.asciidoc[]
The scripting module enables you to use scripts to evaluate custom
expressions. For example, you could use a script to return "script fields"
as part of a search request or evaluate a custom score for a query.

TIP: Elasticsearch now has a built-in scripting language called _Painless_
that provides a more secure alternative for implementing
scripts for Elasticsearch. We encourage you to try it out --
for more information, see <<modules-scripting-painless, Painless Scripting Language>>.

The default scripting language is http://groovy-lang.org/[groovy].
Additional `lang` plugins enable you to run scripts written in other languages.
Everywhere a script can be used, you can include a `lang` parameter
to specify the language of the script.

[float]
=== General-purpose languages:

These languages can be used for any purpose in the scripting APIs,
and give the most flexibility.

[cols="<,<,<",options="header",]
|=======================================================================
|Language
|Sandboxed
|Required plugin

|<<modules-scripting-painless, `painless`>>
|yes
|built-in

|<<modules-scripting-groovy, `groovy`>>
|<<modules-scripting-security, no>>
|built-in

|{plugins}/lang-javascript.html[`javascript`]
|<<modules-scripting-security, no>>
|{plugins}/lang-javascript.html[`lang-javascript`]

|{plugins}/lang-python.html[`python`]
|<<modules-scripting-security, no>>
|{plugins}/lang-python.html[`lang-python`]

|=======================================================================

[float]
=== Special-purpose languages:

These languages are less flexible, but typically have higher performance for
certain tasks.

[cols="<,<,<,<",options="header",]
|=======================================================================
|Language
|Sandboxed
|Required plugin
|Purpose

|<<modules-scripting-expression, `expression`>>
|yes
|built-in
|fast custom ranking and sorting

|<<search-template, `mustache`>>
|yes
|built-in
|templates

|<<modules-scripting-native, `java`>>
|n/a
|you write it!
|expert API

|=======================================================================

[WARNING]
.Scripts and security
=================================================
Languages that are sandboxed are designed with security in mind. However, non-
sandboxed languages can be a security issue, please read
<<modules-scripting-security, Scripting and security>> for more details.
=================================================


include::scripting/using.asciidoc[]

include::scripting/fields.asciidoc[]

include::scripting/security.asciidoc[]

include::scripting/groovy.asciidoc[]

include::scripting/painless.asciidoc[]

include::scripting/expression.asciidoc[]

include::scripting/native.asciidoc[]

include::scripting/advanced-scripting.asciidoc[]

20 changes: 12 additions & 8 deletions docs/reference/modules/scripting/advanced-scripting.asciidoc
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
[[modules-advanced-scripting]]
=== Text scoring in scripts
=== Advanced text scoring in scripts

experimental[The functionality described on this page is considered experimental and may be changed or removed in a future release]

Text features, such as term or document frequency for a specific term can be accessed in scripts (see <<modules-scripting, scripting documentation>> ) with the `_index` variable. This can be useful if, for example, you want to implement your own scoring model using for example a script inside a <<query-dsl-function-score-query,function score query>>.
Text features, such as term or document frequency for a specific term can be
accessed in scripts with the `_index` variable. This can be useful if, for
example, you want to implement your own scoring model using for example a
script inside a <<query-dsl-function-score-query,function score query>>.
Statistics over the document collection are computed *per shard*, not per
index.

[float]
==== Nomenclature:
=== Nomenclature:


[horizontal]
Expand All @@ -33,7 +37,7 @@ depending on the shard the current document resides in.


[float]
==== Shard statistics:
=== Shard statistics:

`_index.numDocs()`::

Expand All @@ -49,7 +53,7 @@ depending on the shard the current document resides in.


[float]
==== Field statistics:
=== Field statistics:

Field statistics can be accessed with a subscript operator like this:
`_index['FIELD']`.
Expand All @@ -74,7 +78,7 @@ depending on the shard the current document resides in.
The number of terms in a field cannot be accessed using the `_index` variable. See <<token-count>> for how to do that.

[float]
==== Term statistics:
=== Term statistics:

Term statistics for a field can be accessed with a subscript operator like
this: `_index['FIELD']['TERM']`. This will never return null, even if term or field does not exist.
Expand All @@ -101,7 +105,7 @@ affect is your set the <<index-options,`index_options`>> to `docs`.


[float]
==== Term positions, offsets and payloads:
=== Term positions, offsets and payloads:

If you need information on the positions of terms in a field, call
`_index['FIELD'].get('TERM', flag)` where flag can be
Expand Down Expand Up @@ -174,7 +178,7 @@ return score;


[float]
==== Term vectors:
=== Term vectors:

The `_index` variable can only be used to gather statistics for single terms. If you want to use information on all terms in a field, you must store the term vectors (see <<term-vector>>). To access them, call
`_index.termVectors()` to get a
Expand Down
120 changes: 120 additions & 0 deletions docs/reference/modules/scripting/expression.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
[[modules-scripting-expression]]
=== Lucene Expressions Language

Lucene's expressions compile a `javascript` expression to bytecode. They are
designed for high-performance custom ranking and sorting functions and are
enabled for `inline` and `stored` scripting by default.

[float]
=== Performance

Expressions were designed to have competitive performance with custom Lucene code.
This performance is due to having low per-document overhead as opposed to other
scripting engines: expressions do more "up-front".

This allows for very fast execution, even faster than if you had written a `native` script.

[float]
=== Syntax

Expressions support a subset of javascript syntax: a single expression.

See the link:http://lucene.apache.org/core/6_0_0/expressions/index.html?org/apache/lucene/expressions/js/package-summary.html[expressions module documentation]
for details on what operators and functions are available.

Variables in `expression` scripts are available to access:

* document fields, e.g. `doc['myfield'].value`
* variables and methods that the field supports, e.g. `doc['myfield'].empty`
* Parameters passed into the script, e.g. `mymodifier`
* The current document's score, `_score` (only available when used in a `script_score`)

You can use Expressions scripts for `script_score`, `script_fields`, sort scripts, and numeric aggregation
scripts, simply set the `lang` parameter to `expression`.

[float]
=== Numeric field API
[cols="<,<",options="header",]
|=======================================================================
|Expression |Description
|`doc['field_name'].value` |The value of the field, as a `double`

|`doc['field_name'].empty` |A boolean indicating if the field has no
values within the doc.

|`doc['field_name'].min()` |The minimum value of the field in this document.

|`doc['field_name'].max()` |The maximum value of the field in this document.

|`doc['field_name'].median()` |The median value of the field in this document.

|`doc['field_name'].avg()` |The average of the values in this document.

|`doc['field_name'].sum()` |The sum of the values in this document.

|`doc['field_name'].count()` |The number of values in this document.
|=======================================================================

When a document is missing the field completely, by default the value will be treated as `0`.
You can treat it as another value instead, e.g. `doc['myfield'].empty ? 100 : doc['myfield'].value`

When a document has multiple values for the field, by default the minimum value is returned.
You can choose a different value instead, e.g. `doc['myfield'].sum()`.

When a document is missing the field completely, by default the value will be treated as `0`.

Boolean fields are exposed as numerics, with `true` mapped to `1` and `false` mapped to `0`.
For example: `doc['on_sale'].value ? doc['price'].value * 0.5 : doc['price'].value`

[float]
=== Date field API
Date fields are treated as the number of milliseconds since January 1, 1970 and
support the Numeric Fields API above, with these additional methods:

[cols="<,<",options="header",]
|=======================================================================
|Expression |Description
|`doc['field_name'].getYear()` |Year component, e.g. `1970`.

|`doc['field_name'].getMonth()` |Month component (0-11), e.g. `0` for January.

|`doc['field_name'].getDayOfMonth()` |Day component, e.g. `1` for the first of the month.

|`doc['field_name'].getHourOfDay()` |Hour component (0-23)

|`doc['field_name'].getMinutes()` |Minutes component (0-59)

|`doc['field_name'].getSeconds()` |Seconds component (0-59)
|=======================================================================

The following example shows the difference in years between the `date` fields date0 and date1:

`doc['date1'].getYear() - doc['date0'].getYear()`

[float]
=== `geo_point` field API
[cols="<,<",options="header",]
|=======================================================================
|Expression |Description
|`doc['field_name'].empty` |A boolean indicating if the field has no
values within the doc.

|`doc['field_name'].lat` |The latitude of the geo point, or `null`.

|`doc['field_name'].lon` |The longitude of the geo point, or `null`.
|=======================================================================

The following example computes distance in kilometers from Washington, DC:

`haversin(38.9072, 77.0369, doc['field_name'].lat, doc['field_name'].lon)`

In this example the coordinates could have been passed as parameters to the script,
e.g. based on geolocation of the user.

[float]
=== Limitations

There are a few limitations relative to other script languages:

* Only numeric, boolean, date, and geo_point fields may be accessed
* Stored fields are not available
Loading

0 comments on commit 34d90b0

Please sign in to comment.