Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Implement FIRST/LAST aggregate functions #37936

Merged
merged 11 commits into from
Jan 31, 2019
135 changes: 98 additions & 37 deletions docs/reference/sql/functions/aggs.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -119,35 +119,53 @@ include-tagged::{sql-specs}/docs.csv-spec[aggCountDistinct]
.Synopsis:
[source, sql]
--------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for optional arguments, it would be nice to add an example with FIRST just with the first argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, for synopsis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@costin our documentation, until now, lists one version of a function in synopsis where the optional arguments are between square brackets. See https://github.com/elastic/elasticsearch/blob/master/docs/reference/sql/functions/string.asciidoc#locate and https://github.com/elastic/elasticsearch/blob/master/docs/reference/sql/functions/math.asciidoc#round (and others). Listing all versions of a function (with/without optional params) imo would look a bit crowded and not-so-clean and sleek. In this particular case of FIRST/FIRST_VALUE/LAST/LAST_VALUE means we need to list four versions of the same function in the synopsis and I personally believe it doesn't look that great.

Do you believe that, maybe, it is not clear that parameter is optional? (the square brackets and the description saying "optional")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the optional parameter but the method name that I'd like to be exemplified through a query to make the function name more obvious.
It might just be easier to create a separate section for the alias, mentioning that it's an alias for ...
This can be done in a separate PR though since it's just documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIRST(field_name<1>, sort_by_field_name<2>)
FIRST(field_name<1>)
FIRST_VALUE(field_name<1>)

FIRST(field_name<1>, ordering_field_name<2>)
FIRST_VALUE(field_name<1>, ordering_field_name<2>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Square brackets for optional fields, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But now we list both with 1 arg and 2args, do you think is still necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do it like this (all versions listed - with/without the optional parameter), our documentation will be inconsistent: https://github.com/elastic/elasticsearch/blob/master/docs/reference/sql/functions/string.asciidoc#locate. If I would have to choose between the two approaches, I would go with [ ].

Copy link
Contributor Author

@matriv matriv Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion you either have one signature with the 2nd arg in square brackets or you list the 2 variants. Personally I prefer the one signature and the square brackets. @costin what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opt for the signature with the second arguments in square brackets.

--------------------------------------------------

*Input*:

<1> a field name
<2> a field name; optional
<1> target field for the aggregation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"aggregation" refers to technical implementation in the background. My personal approach would be not to expose this, but try to explain what the field is used for from the end-user perspective. Again, personal preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily - for the aggregate function (aggregation is ES terminology, aggregate is SQL).

<2> optional field used for ordering

*Output*: same type as the input

.Description:

When only one argument is provided it returns the first **non-NULL** value across input values in the field
`field_name`. It will return **NULL** only if all values in `field_name` are null. When a second argument
is provided then it returns the first **non-NULL** value across input values in the field `field_name` ordered
ascending by the **non-NULL** values of `sort_by_field_name`. E.g.:
Returns the first **non-NULL** value (if such exists) of the `field_name` input column sorted by
the `ordering_field_name` column. If `ordering_field_name` is not provided, only the `field_name`
column is used for the sorting. E.g.:

[cols="<,<"]
|===
s|a|b
| 100 | 1
| 200 | 1
| 1 | 2
| 2 | 2
| 10 | null
| 20 | null
|
s| a | b

| 100 | 1
| 200 | 1
| 1 | 2
| 2 | 2
| 10 | null
| 20 | null
| null | null
|===

[source, sql]
----------------------
SELECT FIRST(a) FROM t
----------------------

will result in:
[cols="<"]
|===
s| FIRST(a)
| 1
|===

and

[source, sql]
-------------------------
SELECT FIRST(a, b) FROM t
Expand All @@ -156,8 +174,8 @@ SELECT FIRST(a, b) FROM t
will result in:
[cols="<"]
|===
s|FIRST(a, b)
| 100
s| FIRST(a, b)
| 100
|===


Expand All @@ -166,49 +184,77 @@ s|FIRST(a, b)
include-tagged::{sql-specs}/docs.csv-spec[firstWithOneArg]
-----------------------------------------------------------

["source","sql",subs="attributes,macros"]
--------------------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[firstWithOneArgAndGroupBy]
--------------------------------------------------------------------

["source","sql",subs="attributes,macros"]
-----------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[firstWithTwoArgs]
-----------------------------------------------------------

["source","sql",subs="attributes,macros"]
---------------------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[firstWithTwoArgsAndGroupBy]
---------------------------------------------------------------------

[NOTE]
`FIRST` cannot be used in to create a filter in a `HAVING` clause of a `GROUP BY` query.
`FIRST` cannot be used in a HAVING clause.

[[sql-functions-aggs-last]]
===== `LAST/LAST_VALUE`

.Synopsis:
[source, sql]
--------------------------------------------------
LAST(field_name<1>, sort_by_field_name<2>)
LAST(field_name<1>)
LAST_VALUE(field_name<1>)

LAST(field_name<1>, ordering_field_name<2>)
LAST_VALUE(field_name<1>, ordering_field_name<2>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Square brackets for optional fields, please.

--------------------------------------------------

*Input*:

<1> a field name
<2> a field name; optional
<1> target field for the aggregation
<2> optional field used for ordering

*Output*: same type as the input

.Description:

It's the inverse of <<sql-functions-aggs-first>>. When only one argument is provided it returns the
last **non-NULL** value across input values in the field `field_name`. It will return **NULL** only if
all values in `field_name` are null. When a second argument is provided then it returns the last
**non-NULL** value across input values in the field `field_name` ordered descending by the **non-NULL**
values of `sort_by_field_name`. E.g.:
It's the inverse of <<sql-functions-aggs-first>>. Returns the last **non-NULL** value (if such exists) of the
`field_name`input column sorted descending by the `ordering_field_name` column. If `ordering_field_name` is not
provided, only the `field_name` column is used for the sorting. E.g.:

[cols="<,<"]
|===
s|a|b
| 10 | 1
| 20 | 1
| 1 | 2
| 2 | 2
| 100 | null
| 200 | null
s| a | b

| 10 | 1
| 20 | 1
| 1 | 2
| 2 | 2
| 100 | null
| 200 | null
| null | null
|===

[source, sql]
------------------------
SELECT LAST(a) FROM t
------------------------

will result in:
[cols="<"]
|===
s| LAST(a)
| 200
|===

and

[source, sql]
------------------------
SELECT LAST(a, b) FROM t
Expand All @@ -217,8 +263,8 @@ SELECT LAST(a, b) FROM t
will result in:
[cols="<"]
|===
s|LAST(a, b)
| 2
s| LAST(a, b)
| 2
|===


Expand All @@ -227,13 +273,23 @@ s|LAST(a, b)
include-tagged::{sql-specs}/docs.csv-spec[lastWithOneArg]
-----------------------------------------------------------

["source","sql",subs="attributes,macros"]
-------------------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[lastWithOneArgAndGroupBy]
-------------------------------------------------------------------

["source","sql",subs="attributes,macros"]
-----------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[lastWithTwoArgs]
-----------------------------------------------------------

["source","sql",subs="attributes,macros"]
--------------------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[lastWithTwoArgsAndGroupBy]
--------------------------------------------------------------------

[NOTE]
`LAST` cannot be used in to create a filter in a `HAVING` clause of a `GROUP BY` query.
`LAST` cannot be used in `HAVING` clause.


[[sql-functions-aggs-max]]
Expand All @@ -260,6 +316,10 @@ Returns the maximum value across input values in the field `field_name`.
include-tagged::{sql-specs}/docs.csv-spec[aggMax]
--------------------------------------------------

[NOTE]
`MAX` on a field of type <<text, `text`>> or <<keyword, `keyword`>> is translated into
<<sql-functions-aggs-last>> and therefore, it cannot be used in `HAVING` clause.

[[sql-functions-aggs-min]]
===== `MIN`

Expand All @@ -285,7 +345,8 @@ include-tagged::{sql-specs}/docs.csv-spec[aggMin]
--------------------------------------------------

[NOTE]
`MIN` on a field of type <<text, `text`>> or <<keyword, `keyword`>> is translated into <<sql-functions-aggs-first>>.
`MIN` on a field of type <<text, `text`>> or <<keyword, `keyword`>> is translated into
<<sql-functions-aggs-first>> and therefore, it cannot be used in `HAVING` clause.

[[sql-functions-aggs-sum]]
===== `SUM`
Expand Down
44 changes: 42 additions & 2 deletions x-pack/plugin/sql/qa/src/main/resources/agg.csv-spec
Original file line number Diff line number Diff line change
Expand Up @@ -379,7 +379,7 @@ schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name) as first, LAST(first_name) as last FROM test_emp GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+------------
---------------+---------------+---------------
null | Berni | Patricio
F | Alejandro | Xinglin
M | Amabile | Zvonko
Expand All @@ -390,12 +390,52 @@ schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name, birth_date) as first, LAST(first_name, birth_date) as last FROM test_emp GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+-----------------
---------------+---------------+---------------
null | Lillian | Eberhardt
F | Sumant | Valdiodio
M | Remzi | Hilari
;

topHitsWithTwoArgsAndGroupByWithNullsOnTargetField
schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name, birth_date) AS first, LAST(first_name, birth_date) AS last FROM test_emp WHERE emp_no BETWEEN 10025 AND 10035 GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+---------------
F | null | Divier
M | null | Domenick
;

topHitsWithTwoArgsAndGroupByWithNullsOnSortingField
schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name, birth_date) AS first, LAST(first_name, birth_date) AS last FROM test_emp WHERE emp_no BETWEEN 10047 AND 10052 GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+---------------
F | Basil | Basil
M | Hidefumi | Heping
;

topHitsWithTwoArgsAndGroupByWithNullsOnTargetAndSortingField
schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name, birth_date) AS first, LAST(first_name, birth_date) AS last FROM test_emp WHERE emp_no BETWEEN 10037 AND 10052 GROUP BY gender ORDER BY gender;

gender | first | last
---------------+-------------+-----------------
F | Basil | Weiyi
M | Hidefumi | null
;

topHitsWithTwoArgsAndGroupByWithAllNullsOnTargetField
schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name, birth_date) AS first, LAST(first_name, birth_date) AS last FROM test_emp WHERE emp_no BETWEEN 10030 AND 10037 GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+---------------
F | null | null
M | null | null
;

topHitsOnDatetime
schema::gender:s|first:i|last:i
SELECT gender, month(first(birth_date, languages)) first, month(last(birth_date, languages)) last FROM test_emp GROUP BY gender ORDER BY gender;
Expand Down
Loading