-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Implement FIRST/LAST aggregate functions #37936
Changes from 5 commits
1be7ba1
3dd561a
cea52e1
43092ad
cd4df06
f2a516d
167ecf6
83daf26
fea067f
04c0e14
2ca0724
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -119,35 +119,53 @@ include-tagged::{sql-specs}/docs.csv-spec[aggCountDistinct] | |
.Synopsis: | ||
[source, sql] | ||
-------------------------------------------------- | ||
FIRST(field_name<1>, sort_by_field_name<2>) | ||
FIRST(field_name<1>) | ||
FIRST_VALUE(field_name<1>) | ||
|
||
FIRST(field_name<1>, ordering_field_name<2>) | ||
FIRST_VALUE(field_name<1>, ordering_field_name<2>) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Square brackets for optional fields, please. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But now we list both with 1 arg and 2args, do you think is still necessary? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we do it like this (all versions listed - with/without the optional parameter), our documentation will be inconsistent: https://github.com/elastic/elasticsearch/blob/master/docs/reference/sql/functions/string.asciidoc#locate. If I would have to choose between the two approaches, I would go with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In my opinion you either have one signature with the 2nd arg in square brackets or you list the 2 variants. Personally I prefer the one signature and the square brackets. @costin what do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I opt for the signature with the second arguments in square brackets. |
||
-------------------------------------------------- | ||
|
||
*Input*: | ||
|
||
<1> a field name | ||
<2> a field name; optional | ||
<1> target field for the aggregation | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "aggregation" refers to technical implementation in the background. My personal approach would be not to expose this, but try to explain what the field is used for from the end-user perspective. Again, personal preference. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not necessarily - for the aggregate function (aggregation is ES terminology, aggregate is SQL). |
||
<2> optional field used for ordering | ||
|
||
*Output*: same type as the input | ||
|
||
.Description: | ||
|
||
When only one argument is provided it returns the first **non-NULL** value across input values in the field | ||
`field_name`. It will return **NULL** only if all values in `field_name` are null. When a second argument | ||
is provided then it returns the first **non-NULL** value across input values in the field `field_name` ordered | ||
ascending by the **non-NULL** values of `sort_by_field_name`. E.g.: | ||
Returns the first **non-NULL** value (if such exists) of the `field_name` input column sorted by | ||
the `ordering_field_name` column. If `ordering_field_name` is not provided, only the `field_name` | ||
column is used for the sorting. E.g.: | ||
|
||
[cols="<,<"] | ||
|=== | ||
s|a|b | ||
| 100 | 1 | ||
| 200 | 1 | ||
| 1 | 2 | ||
| 2 | 2 | ||
| 10 | null | ||
| 20 | null | ||
| | ||
s| a | b | ||
|
||
| 100 | 1 | ||
| 200 | 1 | ||
| 1 | 2 | ||
| 2 | 2 | ||
| 10 | null | ||
| 20 | null | ||
| null | null | ||
|=== | ||
|
||
[source, sql] | ||
---------------------- | ||
SELECT FIRST(a) FROM t | ||
---------------------- | ||
|
||
will result in: | ||
[cols="<"] | ||
|=== | ||
s| FIRST(a) | ||
| 1 | ||
|=== | ||
|
||
and | ||
|
||
[source, sql] | ||
------------------------- | ||
SELECT FIRST(a, b) FROM t | ||
|
@@ -156,8 +174,8 @@ SELECT FIRST(a, b) FROM t | |
will result in: | ||
[cols="<"] | ||
|=== | ||
s|FIRST(a, b) | ||
| 100 | ||
s| FIRST(a, b) | ||
| 100 | ||
|=== | ||
|
||
|
||
|
@@ -166,49 +184,77 @@ s|FIRST(a, b) | |
include-tagged::{sql-specs}/docs.csv-spec[firstWithOneArg] | ||
----------------------------------------------------------- | ||
|
||
["source","sql",subs="attributes,macros"] | ||
-------------------------------------------------------------------- | ||
include-tagged::{sql-specs}/docs.csv-spec[firstWithOneArgAndGroupBy] | ||
-------------------------------------------------------------------- | ||
|
||
["source","sql",subs="attributes,macros"] | ||
----------------------------------------------------------- | ||
include-tagged::{sql-specs}/docs.csv-spec[firstWithTwoArgs] | ||
----------------------------------------------------------- | ||
|
||
["source","sql",subs="attributes,macros"] | ||
--------------------------------------------------------------------- | ||
include-tagged::{sql-specs}/docs.csv-spec[firstWithTwoArgsAndGroupBy] | ||
--------------------------------------------------------------------- | ||
|
||
[NOTE] | ||
`FIRST` cannot be used in to create a filter in a `HAVING` clause of a `GROUP BY` query. | ||
`FIRST` cannot be used in a HAVING clause. | ||
|
||
[[sql-functions-aggs-last]] | ||
===== `LAST/LAST_VALUE` | ||
|
||
.Synopsis: | ||
[source, sql] | ||
-------------------------------------------------- | ||
LAST(field_name<1>, sort_by_field_name<2>) | ||
LAST(field_name<1>) | ||
LAST_VALUE(field_name<1>) | ||
|
||
LAST(field_name<1>, ordering_field_name<2>) | ||
LAST_VALUE(field_name<1>, ordering_field_name<2>) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Square brackets for optional fields, please. |
||
-------------------------------------------------- | ||
|
||
*Input*: | ||
|
||
<1> a field name | ||
<2> a field name; optional | ||
<1> target field for the aggregation | ||
<2> optional field used for ordering | ||
|
||
*Output*: same type as the input | ||
|
||
.Description: | ||
|
||
It's the inverse of <<sql-functions-aggs-first>>. When only one argument is provided it returns the | ||
last **non-NULL** value across input values in the field `field_name`. It will return **NULL** only if | ||
all values in `field_name` are null. When a second argument is provided then it returns the last | ||
**non-NULL** value across input values in the field `field_name` ordered descending by the **non-NULL** | ||
values of `sort_by_field_name`. E.g.: | ||
It's the inverse of <<sql-functions-aggs-first>>. Returns the last **non-NULL** value (if such exists) of the | ||
`field_name`input column sorted descending by the `ordering_field_name` column. If `ordering_field_name` is not | ||
provided, only the `field_name` column is used for the sorting. E.g.: | ||
|
||
[cols="<,<"] | ||
|=== | ||
s|a|b | ||
| 10 | 1 | ||
| 20 | 1 | ||
| 1 | 2 | ||
| 2 | 2 | ||
| 100 | null | ||
| 200 | null | ||
s| a | b | ||
|
||
| 10 | 1 | ||
| 20 | 1 | ||
| 1 | 2 | ||
| 2 | 2 | ||
| 100 | null | ||
| 200 | null | ||
| null | null | ||
|=== | ||
|
||
[source, sql] | ||
------------------------ | ||
SELECT LAST(a) FROM t | ||
------------------------ | ||
|
||
will result in: | ||
[cols="<"] | ||
|=== | ||
s| LAST(a) | ||
| 200 | ||
|=== | ||
|
||
and | ||
|
||
[source, sql] | ||
------------------------ | ||
SELECT LAST(a, b) FROM t | ||
|
@@ -217,8 +263,8 @@ SELECT LAST(a, b) FROM t | |
will result in: | ||
[cols="<"] | ||
|=== | ||
s|LAST(a, b) | ||
| 2 | ||
s| LAST(a, b) | ||
| 2 | ||
|=== | ||
|
||
|
||
|
@@ -227,13 +273,23 @@ s|LAST(a, b) | |
include-tagged::{sql-specs}/docs.csv-spec[lastWithOneArg] | ||
----------------------------------------------------------- | ||
|
||
["source","sql",subs="attributes,macros"] | ||
------------------------------------------------------------------- | ||
include-tagged::{sql-specs}/docs.csv-spec[lastWithOneArgAndGroupBy] | ||
------------------------------------------------------------------- | ||
|
||
["source","sql",subs="attributes,macros"] | ||
----------------------------------------------------------- | ||
include-tagged::{sql-specs}/docs.csv-spec[lastWithTwoArgs] | ||
----------------------------------------------------------- | ||
|
||
["source","sql",subs="attributes,macros"] | ||
-------------------------------------------------------------------- | ||
include-tagged::{sql-specs}/docs.csv-spec[lastWithTwoArgsAndGroupBy] | ||
-------------------------------------------------------------------- | ||
|
||
[NOTE] | ||
`LAST` cannot be used in to create a filter in a `HAVING` clause of a `GROUP BY` query. | ||
`LAST` cannot be used in `HAVING` clause. | ||
|
||
|
||
[[sql-functions-aggs-max]] | ||
|
@@ -260,6 +316,10 @@ Returns the maximum value across input values in the field `field_name`. | |
include-tagged::{sql-specs}/docs.csv-spec[aggMax] | ||
-------------------------------------------------- | ||
|
||
[NOTE] | ||
`MAX` on a field of type <<text, `text`>> or <<keyword, `keyword`>> is translated into | ||
<<sql-functions-aggs-last>> and therefore, it cannot be used in `HAVING` clause. | ||
|
||
[[sql-functions-aggs-min]] | ||
===== `MIN` | ||
|
||
|
@@ -285,7 +345,8 @@ include-tagged::{sql-specs}/docs.csv-spec[aggMin] | |
-------------------------------------------------- | ||
|
||
[NOTE] | ||
`MIN` on a field of type <<text, `text`>> or <<keyword, `keyword`>> is translated into <<sql-functions-aggs-first>>. | ||
`MIN` on a field of type <<text, `text`>> or <<keyword, `keyword`>> is translated into | ||
<<sql-functions-aggs-first>> and therefore, it cannot be used in `HAVING` clause. | ||
|
||
[[sql-functions-aggs-sum]] | ||
===== `SUM` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for optional arguments, it would be nice to add an example with
FIRST
just with the first argument.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean here in the synopsis? because we have an example here: https://github.com/elastic/elasticsearch/pull/37936/files/1be7ba194c950ab175ca0e35d927ab0b54306be2#diff-f71f0c2cd27202241aec808b033b8510R166
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, for synopsis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@costin our documentation, until now, lists one version of a function in synopsis where the optional arguments are between square brackets. See https://github.com/elastic/elasticsearch/blob/master/docs/reference/sql/functions/string.asciidoc#locate and https://github.com/elastic/elasticsearch/blob/master/docs/reference/sql/functions/math.asciidoc#round (and others). Listing all versions of a function (with/without optional params) imo would look a bit crowded and not-so-clean and sleek. In this particular case of FIRST/FIRST_VALUE/LAST/LAST_VALUE means we need to list four versions of the same function in the synopsis and I personally believe it doesn't look that great.
Do you believe that, maybe, it is not clear that parameter is optional? (the square brackets and the description saying "optional")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the optional parameter but the method name that I'd like to be exemplified through a query to make the function name more obvious.
It might just be easier to create a separate section for the alias, mentioning that it's an alias for ...
This can be done in a separate PR though since it's just documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added here: https://github.com/elastic/elasticsearch/pull/37936/files#diff-f71f0c2cd27202241aec808b033b8510R198