-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add release notes for 0.254 #16141
Add release notes for 0.254 #16141
Conversation
e7c12a3
to
4361a9e
Compare
* Add documentation for Glue Catalog support in Hive. :doc:`/connector/hive`. | ||
* Add iceberg connector. | ||
* Add support to fragment result caching for unnest. | ||
* Add ``poisson_cdf`` and ``inverse_poisson_cdf`` functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functions need to be written as :func:poisson_cdf
check e2e4751#diff-9652f5c5a99ec7ab2293ce1dfbee5491f95d8b02ed72259cbac24f0dfbec565fR13
cc183dd
to
6106301
Compare
General Changes | ||
_______________ | ||
* Fix a bug where queries that have both remote functions and a local function with only constant arguments could throw an IndexOutOfBoundException during planning. The bug was introduced in release 0.253 by :pr:`16039`. | ||
* Add documentation for Glue Catalog support in Hive. :doc:`/connector/hive`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need a release note for new documentation. but if we wanted to keep it, it would go in the hive section.
* Add memory tracking in ``TableFinishOperator`` which can be enabled by setting the ``table-finish-operator-memory-tracking-enabled`` configuration property to ``true``. | ||
* Remove spilling strategy ``PER_QUERY_MEMORY_LIMIT`` and instead add configuration property ``experimental.query-limit-spill-enabled`` and session property ``query_limit_spill_enabled``. When this property is set to ``true``, and the spill strategy is not ``PER_TASK_MEMORY_THRESHOLD``, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory, in addition to whenever the memory pool exceeds the spill threshold. This fixes an issue where using the ``PER_QUERY_MEMORY_LIMIT`` spilling strategy could prevent the oom killer from running when the memory pool was full. The issue is still present for the ``PER_TASK_MEMORY_THRESHOLD`` spilling strategy. | ||
|
||
Hive Connector Changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should all be part of the Hive Changes section
_______________ | ||
* Fix a bug where queries that have both remote functions and a local function with only constant arguments could throw an IndexOutOfBoundException during planning. The bug was introduced in release 0.253 by :pr:`16039`. | ||
* Add documentation for Glue Catalog support in Hive. :doc:`/connector/hive`. | ||
* Add iceberg connector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe add a new section called "Iceberg Changes", similar to other connectors. and then have the release note "Add new Iceberg Connector", with a link to the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this since there's no documentation added
* Fix a bug where queries that have both remote functions and a local function with only constant arguments could throw an IndexOutOfBoundException during planning. The bug was introduced in release 0.253 by :pr:`16039`. | ||
* Add documentation for Glue Catalog support in Hive. :doc:`/connector/hive`. | ||
* Add iceberg connector. | ||
* Add support to fragment result caching for unnest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shixuan-fan could you provide more details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "Add fragment result caching support for UNNEST queries"?
|
||
Hive Changes | ||
____________ | ||
* Add support for MaxResults on Glue Hive Metastore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a configuration property? what is MaxResults and why is it one word?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@v-jizhang can you help add more details around MaxResults
here. I don't see a configuration property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sujay-jain @rschlussel There is no configuration property for MaxResults. Set it according to the page https://docs.aws.amazon.com/glue/latest/webapi/API_GetPartitions.html
Hive Changes | ||
____________ | ||
* Add support for MaxResults on Glue Hive Metastore. | ||
* Add support for bucket sort order in Glue when creating or updating a table or partition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is "bucket sort order" and how does the user use this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@v-jizhang @highker could you provide more details here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a back port from Trino to fix a bug, trinodb/trino#2450
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rschlussel the release notes match those of Trino, and it's a backport. Do you think we should keep this as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @v-jizhang. looked at that PR. We should make our release note clearer. Something like.
Fix a bug where the files would not be sorted when inserting into bucketed sorted tables with Glue
Also, since it's a bug fix, should go first in the section
* Add support for MaxResults on Glue Hive Metastore. | ||
* Add support for bucket sort order in Glue when creating or updating a table or partition. | ||
* Add support for partition cache validation. This can be enabled by setting ``hive.partition-cache-validation-percentage`` configuration parameter. | ||
* Add support for partition schema evolution for parquet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add more detail in the release note here about what we mean by schema evolution? Does it mean you can query tables that have had columns added/deleted? column types changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjalpreet @zhenxiao - could you help provide more details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
support add/delete/replace columns in partition schema
@imjalpreet could you please add more details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can write something like:
Add support for allowing to match columns between table and partition schemas by names when the configuration property ``hive.parquet.use-column-names`` or the hive catalog session property ``parquet_use_column_names`` is set to true. By default they are mapped by index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to add more:
This allows schema evolution between table and partition in addition to schema evolution between the table/partition and file. Columns can be re-ordered, added or dropped between partition and table schemas.
____________ | ||
* Add support for MaxResults on Glue Hive Metastore. | ||
* Add support for bucket sort order in Glue when creating or updating a table or partition. | ||
* Add support for partition cache validation. This can be enabled by setting ``hive.partition-cache-validation-percentage`` configuration parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's partition cache validation? Also, what's the default value? Is it 0, which would I guess mean no validation or is it something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NikhilCollooru can you provide more details here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partition cache validation means we validate the value returned from partition cache with the actual value from Metastore. Yes default value is 0.0 meaning no validation. If we set it to 50.0, then it means 50% of the get partitions calls will be validated.
Presto On Spark Changes | ||
_______________________ | ||
* Improve commit memory footprint on the Driver. | ||
* Add session property ``spark_memory_revoking_threshold`` and configuration property ``spark.memory-revoking-threshold``, spilling is triggered when total memory is beyond this threshold. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should be a period and not a comma before "spilling is triggered"
16f7c2d
to
c645556
Compare
cd8a8c2
to
1186bd0
Compare
Hive Changes | ||
____________ | ||
* Add support for MaxResults on Glue Hive Metastore. | ||
* Add support for bucket sort order in Glue when creating or updating a table or partition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @v-jizhang. looked at that PR. We should make our release note clearer. Something like.
Fix a bug where the files would not be sorted when inserting into bucketed sorted tables with Glue
Also, since it's a bug fix, should go first in the section
|
||
Hive Changes | ||
____________ | ||
* Add support for MaxResults on Glue Hive Metastore. MaxResults is the maximum number of partitions to return in a single response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@v-jizhang I'm still very confused about this release note. It seems like what the PR (#16012) actually does is set the batch size for how many partitions we fetch at once in each API call. But @pettyjamesm was skeptical about whether it even works? My instinct is this doesn't need a release note, but if it does it should be something like Improve efficiency of getting partitions from the Glue Metastore by batching requests
. Would that be accurate? @aweisberg @pettyjamesm what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree. I don't think this is turning on batching the requests it's just limiting the batch size.
Improve efficiency of partition fe5tching from Glue by settting GetPartitions MaxResults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going with removing the release note.
* Add support for bucket sort order in Glue when creating or updating a table or partition. | ||
* Add support for partition cache validation. This can be enabled by setting the ``hive.partition-cache-validation-percentage`` configuration property. Partition cache validation allows us to validate the value returned from partition cache with the actual value from Metastore. The default value is 0.0 which means there is no validation. | ||
* Add support for partition schema evolution for parquet. | ||
* Add support for Glue endpoint URL :doc:`/connector/hive`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add support for configuring the Glue endpoint URL :doc:`/connector/hive`.
* Add support for Glue endpoint URL :doc:`/connector/hive`. | ||
* Add support for the S3 Intelligent-Tiering storage class for writing data. The S3 storage class can be configured using the configuration property ``hive.s3.storage-class``. Supported values are ``STANDARD`` and ``INTELLIGENT_TIERING``, and the default value is ``STANDARD``. :doc:`/connector/hive`. | ||
* Add configuration property ``hive.metastore.glue.max-error-retries`` for the maximum number of retries for glue client connections. The default value is 10. :doc:`/connector/hive`. | ||
* Allow accessing tables in Glue metastore that do not have a table type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe start this "Add support for" instead of "Allow", so it matches the other release notes.
1186bd0
to
2629bcf
Compare
319cc16
to
dfce5cd
Compare
Looks good, but we'll need to add a release note for the regression fix for maps that's going to be added to the 0.254 branch, so holding off on merging for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you copy the release note from #16073 - this was added after the cut and hence not picked up by the script that created the initial PR
|
||
General Changes | ||
_______________ | ||
* Fix a bug where queries that have both remote functions and a local function with only constant arguments could throw an IndexOutOfBoundException during planning. The bug was introduced in release 0.253 by :pr:`16039`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IndexOutOfBoundsException in double back quotes
* Add fragment result caching support for ``UNNEST`` queries. | ||
* Add :func:`poisson_cdf` and :func:`inverse_poisson_cdf` functions. | ||
* Add memory tracking in ``TableFinishOperator`` which can be enabled by setting the ``table-finish-operator-memory-tracking-enabled`` configuration property to ``true``. Enabling this property can help investigating GC issues on the coordinator, by allowing us to debug whether stats collection uses a lot of memory. | ||
* Remove spilling strategy ``PER_QUERY_MEMORY_LIMIT`` and instead add configuration property ``experimental.query-limit-spill-enabled`` and session property ``query_limit_spill_enabled``. When ``query_limit_spill_enabled`` is set to ``true`` and the spill strategy is not ``PER_TASK_MEMORY_THRESHOLD``, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory. We will also still spill whenever the memory pool exceeds the spill threshold. This fixes an issue where using the ``PER_QUERY_MEMORY_LIMIT`` spilling strategy could prevent the oom killer from running when the memory pool was full. The issue is still present for the ``PER_TASK_MEMORY_THRESHOLD`` spilling strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too big for a release note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rschlussel - can we not try to get this within 200-250 chars at max - it is fine to provide people a link to the PR for further clarification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how's this? it's 390 characters, but mostly because the property names are long. I don't think it can be shorter and still convey the relevant information.
Remove spilling strategy ``PER_QUERY_MEMORY_LIMIT`` and add configuration property ``experimental.query-limit-spill-enabled`` and session property ``query_limit_spill_enabled``. When set to ``true`` and the spill strategy is not ``PER_TASK_MEMORY_THRESHOLD``, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory.
Hive Changes | ||
____________ | ||
* Fix a bug where the files would not be sorted when inserting into bucketed sorted tables with Glue. | ||
* Add support for partition cache validation. This can be enabled by setting the ``hive.partition-cache-validation-percentage`` configuration property. Partition cache validation allows us to validate the value returned from partition cache with the actual value from Metastore. The default value is 0.0 which means there is no validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add support to validate the values returned from the partition cache with the actual value from Metastore. This can be enabled by setting the configuration property ``hive......``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep the default value note?
____________ | ||
* Fix a bug where the files would not be sorted when inserting into bucketed sorted tables with Glue. | ||
* Add support for partition cache validation. This can be enabled by setting the ``hive.partition-cache-validation-percentage`` configuration property. Partition cache validation allows us to validate the value returned from partition cache with the actual value from Metastore. The default value is 0.0 which means there is no validation. | ||
* Add support for allowing to match columns between table and partition schemas by names when the configuration property ``hive.parquet.use-column-names`` or the hive catalog session property ``parquet_use_column_names`` is set to true. By default they are mapped by index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true in double back quotes
* Fix a bug where the files would not be sorted when inserting into bucketed sorted tables with Glue. | ||
* Add support for partition cache validation. This can be enabled by setting the ``hive.partition-cache-validation-percentage`` configuration property. Partition cache validation allows us to validate the value returned from partition cache with the actual value from Metastore. The default value is 0.0 which means there is no validation. | ||
* Add support for allowing to match columns between table and partition schemas by names when the configuration property ``hive.parquet.use-column-names`` or the hive catalog session property ``parquet_use_column_names`` is set to true. By default they are mapped by index. | ||
* Add support for configuring the Glue endpoint URL :doc:`/connector/hive`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Period after URL
Add support for configuring the Glue endpoint URL. See :doc:`/connector/hive`.
* Add support for allowing to match columns between table and partition schemas by names when the configuration property ``hive.parquet.use-column-names`` or the hive catalog session property ``parquet_use_column_names`` is set to true. By default they are mapped by index. | ||
* Add support for configuring the Glue endpoint URL :doc:`/connector/hive`. | ||
* Add support for accessing tables in Glue metastore that do not have a table type. | ||
* Add support for the S3 Intelligent-Tiering storage class for writing data. The S3 storage class can be configured using the configuration property ``hive.s3.storage-class``. Supported values are ``STANDARD`` and ``INTELLIGENT_TIERING``, and the default value is ``STANDARD``. :doc:`/connector/hive`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to mention the default value since that would already be there.
Add support for the S3 Intelligent-Tiering storage class writing data. This can be enabled by setting the configuration property ``hive.s3.storage-class`` to ``INTELLIGENT_TIERING``.
|
||
Presto On Spark Changes | ||
_______________________ | ||
* Improve commit memory footprint on the Driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize Driver commit memory footprint.
34d4ac0
to
8fdee7b
Compare
c3426e4
to
503e9fe
Compare
_______________ | ||
* Fix a bug where queries that have both remote functions and a local function with only constant arguments could throw an ``IndexOutOfBoundException`` during planning. The bug was introduced in release 0.253 by :pr:`16039`. | ||
* Fix a regression in cpu time introduced in 0.253 for queries using :func:`element_at` for maps. | ||
* Revert a change introduced in 0.253 to store dictionary elements in Segmented Slice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is too low level. What's the user implication?
* Revert a change introduced in 0.253 to store dictionary elements in Segmented Slice. | ||
* Add fragment result caching support for ``UNNEST`` queries. | ||
* Add :func:`poisson_cdf` and :func:`inverse_poisson_cdf` functions. | ||
* Add memory tracking in ``TableFinishOperator`` which can be enabled by setting the ``table-finish-operator-memory-tracking-enabled`` configuration property to ``true``. Enabling this property can help investigating GC issues on the coordinator, by allowing us to debug whether stats collection uses a lot of memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can help with investigating GC issues
nit: no comma between "coordinator" and "by"
Hive Changes | ||
____________ | ||
* Fix a bug where the files would not be sorted when inserting into bucketed sorted tables with Glue. | ||
* Add support to validate the values returned from the partition cache with the actual value from Metastore. This can be enabled by setting the configuration property ``hive.partition-cache-validation-percentage``. The default value is 0.0 which means there is no validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add support to validate -> Add support for validating
503e9fe
to
1a0848b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I'll merge once @mayankgarg1990 approves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine overall. Some changes before this can be merged
|
||
General Changes | ||
_______________ | ||
* Fix a bug where queries that have both remote functions and a local function with only constant arguments could throw an ``IndexOutOfBoundException`` during planning. The bug was introduced in release 0.253 by :pr:`16039`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shorten the second sentence to
Introduced by :pr:`16039`.
finding which release is trivial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has not been taken care of
General Changes | ||
_______________ | ||
* Fix a bug where queries that have both remote functions and a local function with only constant arguments could throw an ``IndexOutOfBoundException`` during planning. The bug was introduced in release 0.253 by :pr:`16039`. | ||
* Fix a regression in cpu time introduced in 0.253 for queries using :func:`element_at` for maps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets follow a similar syntax for reporting regressions.
Fix a CPU regression for queries using `element_at` for ``MAP``. Introduced by :pr:`16027`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:func:element_at
Hive Changes | ||
____________ | ||
* Fix a bug where the files would not be sorted when inserting into bucketed sorted tables with Glue. | ||
* Add support for validating the values returned from the partition cache with the actual value from Metastore. This can be enabled by setting the configuration property ``hive.partition-cache-validation-percentage``. The default value is 0.0 which means there is no validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: the second line - the default value is 0.0 - your previous sentence says "This can be enabled by " - that indicates that this is disabled by default. Had it been enabled by default - the sentence would have read - "This can be disabled by" - and hence it is redundant and can be removed.
1a0848b
to
4b07bfc
Compare
Missing Release Notes
Abhisek Gautam Saikia
Akhil Umesh Mehendale
Tal Galili
Vic Zhang
guhanjie
Extracted Release Notes
/connector/hive
./connector/hive
./connector/hive
./connector/hive
.PER_QUERY_MEMORY_LIMIT
and instead add configuration propertyexperimental.query-limit-spill-enabled
and session propertyquery_limit_spill_enabled
. When this property is set totrue
, and the spill strategy is notPER_TASK_MEMORY_THRESHOLD
, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory, in addition to whenever the memory pool exceeds the spill threshold. This fixes an issue where using thePER_QUERY_MEMORY_LIMIT
spilling strategy could prevent the oom killer from running when the memory pool was full. The issue is still present for thePER_TASK_MEMORY_THRESHOLD
spilling strategy.PER_QUERY_MEMORY_LIMIT
and instead add configuration propertyexperimental.query-limit-spill-enabled
and session propertyquery_limit_spill_enabled
. When this property is set totrue
, and the spill strategy is notPER_TASK_MEMORY_THRESHOLD
, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory, in addition to whenever the memory pool exceeds the spill threshold. This fixes an issue where using thePER_QUERY_MEMORY_LIMIT
spilling strategy could prevent the oom killer from running when the memory pool was full. The issue is still present for thePER_TASK_MEMORY_THRESHOLD
spilling strategy.PER_QUERY_MEMORY_LIMIT
and instead add configuration propertyexperimental.query-limit-spill-enabled
and session propertyquery_limit_spill_enabled
. When this property is set totrue
, and the spill strategy is notPER_TASK_MEMORY_THRESHOLD
, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory, in addition to whenever the memory pool exceeds the spill threshold. This fixes an issue where using thePER_QUERY_MEMORY_LIMIT
spilling strategy could prevent the oom killer from running when the memory pool was full. The issue is still present for thePER_TASK_MEMORY_THRESHOLD
spilling strategy.PER_QUERY_MEMORY_LIMIT
and instead add configuration propertyexperimental.query-limit-spill-enabled
and session propertyquery_limit_spill_enabled
. When this property is set totrue
, and the spill strategy is notPER_TASK_MEMORY_THRESHOLD
, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory, in addition to whenever the memory pool exceeds the spill threshold. This fixes an issue where using thePER_QUERY_MEMORY_LIMIT
spilling strategy could prevent the oom killer from running when the memory pool was full. The issue is still present for thePER_TASK_MEMORY_THRESHOLD
spilling strategy.table-finish-operator-memory-tracking-enabled
configuration property totrue
.hive.partition-cache-validation-percentage
configuration parameter.16039
.All Commits