Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add release notes for 0.254 #16141

Merged
merged 1 commit into from
Jun 1, 2021

Conversation

sujay-jain
Copy link
Member

@sujay-jain sujay-jain commented May 21, 2021

Missing Release Notes

Abhisek Gautam Saikia

Akhil Umesh Mehendale

Tal Galili

Vic Zhang

guhanjie

Extracted Release Notes

  • Add Presto iceberg connector #15836 (Author: Chunxu Tang): Add Presto iceberg connector
    • Add iceberg connector.
  • Add Presto iceberg connector #15836 (Author: Chunxu Tang): Add Presto iceberg connector
    • Add iceberg connector.
  • Add Presto iceberg connector #15836 (Author: Chunxu Tang): Add Presto iceberg connector
    • Add iceberg connector.
  • Add Presto iceberg connector #15836 (Author: Chunxu Tang): Add Presto iceberg connector
    • Add iceberg connector.
  • Add Presto iceberg connector #15836 (Author: Chunxu Tang): Add Presto iceberg connector
    • Add iceberg connector.
  • Add Presto iceberg connector #15836 (Author: Chunxu Tang): Add Presto iceberg connector
    • Add iceberg connector.
  • Add Presto iceberg connector #15836 (Author: Chunxu Tang): Add Presto iceberg connector
    • Add iceberg connector.
  • Add Presto iceberg connector #15836 (Author: Chunxu Tang): Add Presto iceberg connector
    • Add iceberg connector.
  • Add Presto iceberg connector #15836 (Author: Chunxu Tang): Add Presto iceberg connector
    • Add iceberg connector.
  • Add Presto iceberg connector #15836 (Author: Chunxu Tang): Add Presto iceberg connector
    • Add iceberg connector.
  • Add Presto iceberg connector #15836 (Author: Chunxu Tang): Add Presto iceberg connector
    • Add iceberg connector.
  • Add Presto iceberg connector #15836 (Author: Chunxu Tang): Add Presto iceberg connector
    • Add iceberg connector.
  • Add Presto iceberg connector #15836 (Author: Chunxu Tang): Add Presto iceberg connector
    • Add iceberg connector.
  • Add Presto iceberg connector #15836 (Author: Chunxu Tang): Add Presto iceberg connector
    • Add iceberg connector.
  • Do not allocate resources within test constructor #15937 (Author: v-jizhang): Do not allocate resources within test constructor
    • Do not allocate resources within test constructor for a cleaner code.
  • Fix Athena Glue table compatibility issue #15993 (Author: v-jizhang): Fix Athena Glue table compatibility issue
    • Allow accessing tables in Glue metastore that do not have a table type.
  • Add missing support for bucket sort order in Glue #16003 (Author: v-jizhang): Add missing support for bucket sort order in Glue
    • Add support for bucket sort order in Glue when creating or updating a table or partition.
  • Partition schema evolution for Parquet #16011 (Author: Jalpreet Singh Nanda (:imjalpreet)): Partition schema evolution for Parquet
    • Add support for partition schema evolution for parquet.
  • Partition schema evolution for Parquet #16011 (Author: Jalpreet Singh Nanda (:imjalpreet)): Partition schema evolution for Parquet
    • Add support for partition schema evolution for parquet.
  • Add support for MaxResults on Glue Hive Metastore #16012 (Author: v-jizhang): Add support for MaxResults on Glue Hive Metastore
    • Add support for MaxResults on Glue Hive Metastore.
  • Add support for Glue endpoint URL #16014 (Author: v-jizhang): Add support for Glue endpoint URL
    • Add support for Glue endpoint URL :doc:/connector/hive.
  • Add max error retry config option to glue client #16018 (Author: v-jizhang): Add max error retry config option to glue client
    • Add max error retry config option to glue client. Defaults to 10. :doc:/connector/hive.
  • Add intelligent tiering storage class #16028 (Author: v-jizhang): Add intelligent tiering storage class
    • Added intelligent tiering storage class. The S3 storage class to use when writing the data. STANDARD and INTELLIGENT_TIERING storage classes are supported. Default storage class is STANDARD :doc:/connector/hive.
  • Add documentation for Glue Catalog support in Hive #16058 (Author: v-jizhang): Add documentation for Glue Catalog support in Hive
    • Add documentation for Glue Catalog support in Hive. :doc:/connector/hive.
  • Combine spill strategies #16069 (Author: Rebecca Schlussel): Combine spill strategies
    • Remove spilling strategy PER_QUERY_MEMORY_LIMIT and instead add configuration property experimental.query-limit-spill-enabled and session property query_limit_spill_enabled. When this property is set to true, and the spill strategy is not PER_TASK_MEMORY_THRESHOLD, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory, in addition to whenever the memory pool exceeds the spill threshold. This fixes an issue where using the PER_QUERY_MEMORY_LIMIT spilling strategy could prevent the oom killer from running when the memory pool was full. The issue is still present for the PER_TASK_MEMORY_THRESHOLD spilling strategy.
  • Combine spill strategies #16069 (Author: Rebecca Schlussel): Combine spill strategies
    • Remove spilling strategy PER_QUERY_MEMORY_LIMIT and instead add configuration property experimental.query-limit-spill-enabled and session property query_limit_spill_enabled. When this property is set to true, and the spill strategy is not PER_TASK_MEMORY_THRESHOLD, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory, in addition to whenever the memory pool exceeds the spill threshold. This fixes an issue where using the PER_QUERY_MEMORY_LIMIT spilling strategy could prevent the oom killer from running when the memory pool was full. The issue is still present for the PER_TASK_MEMORY_THRESHOLD spilling strategy.
  • Combine spill strategies #16069 (Author: Rebecca Schlussel): Combine spill strategies
    • Remove spilling strategy PER_QUERY_MEMORY_LIMIT and instead add configuration property experimental.query-limit-spill-enabled and session property query_limit_spill_enabled. When this property is set to true, and the spill strategy is not PER_TASK_MEMORY_THRESHOLD, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory, in addition to whenever the memory pool exceeds the spill threshold. This fixes an issue where using the PER_QUERY_MEMORY_LIMIT spilling strategy could prevent the oom killer from running when the memory pool was full. The issue is still present for the PER_TASK_MEMORY_THRESHOLD spilling strategy.
  • Combine spill strategies #16069 (Author: Rebecca Schlussel): Combine spill strategies
    • Remove spilling strategy PER_QUERY_MEMORY_LIMIT and instead add configuration property experimental.query-limit-spill-enabled and session property query_limit_spill_enabled. When this property is set to true, and the spill strategy is not PER_TASK_MEMORY_THRESHOLD, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory, in addition to whenever the memory pool exceeds the spill threshold. This fixes an issue where using the PER_QUERY_MEMORY_LIMIT spilling strategy could prevent the oom killer from running when the memory pool was full. The issue is still present for the PER_TASK_MEMORY_THRESHOLD spilling strategy.
  • Support Unnest in fragment result caching #16071 (Author: Shixuan Fan): Support Unnest in fragment result caching
    • Add support to fragment result caching for unnest.
  • Add properties to update memory in TableFinishOperator #16095 (Author: Vic Zhang): Add properties to update memory in TableFinishOperator
    • Memory tracking in TableFinishOperator can be enabled by setting the table-finish-operator-memory-tracking-enabled configuration property to true.
  • Add support for partition cache verification #16113 (Author: Nikhil Collooru): Add support for partition cache verification
    • Add support for partition cache validation. This can be enabled by setting hive.partition-cache-validation-percentage configuration parameter.
  • Reduce memory utilization on the Driver during the commit phase #16120 (Author: Andrii Rosa): Reduce memory utilization on the Driver during the commit phase
    • Reduce commit memory footprint on the Driver.
  • Reduce memory utilization on the Driver during the commit phase #16120 (Author: Andrii Rosa): Reduce memory utilization on the Driver during the commit phase
    • Reduce commit memory footprint on the Driver.
  • Reduce memory utilization on the Driver during the commit phase #16120 (Author: Andrii Rosa): Reduce memory utilization on the Driver during the commit phase
    • Reduce commit memory footprint on the Driver.
  • Fix PlanRemoteProjections #16136 (Author: Rongrong Zhong): Fix PlanRemoteProjections
    • Fix a bug where queries that have both remote functions and a local function with only constant arguments could throw an IndexOutOfBoundException during planning. The bug was introduced in release 0.253 by :pr:16039.

All Commits

  • 8aa2961 Fix PlanRemoteProjections (Rongrong Zhong)
  • e2944c6 Disable flaky test AbstractTestQueries.testEmptyJoins (Vic Zhang)
  • 8d7d0a9 Add spilling threshold for Presto on Spark (Vic Zhang)
  • 5f11dd2 Include revocable memory for peak node total memory (Vic Zhang)
  • 5f692c0 Include revocable memory for memory limit error (Vic Zhang)
  • 4d17bb0 Support spilling for Presto on Spark (Vic Zhang)
  • 129aab3 Rename TestPrestoSparkAbstractTestAggregations (Vic Zhang)
  • 4428cf7 Fix the partition cache invalidation logic to be more efficient (Nikhil Collooru)
  • a6d5270 Fix ZeroRowFileCreator to use DataSink instead of OutputStream interface (Arjun Gupta)
  • 5eb36ae Add session property for temp storage buffer size (Vic Zhang)
  • 100c120 Disable flaky TestQueues.testQueuedQueryInteraction (Vic Zhang)
  • 4bd5dc5 Log sizes for pages received on the Driver (Andrii Rosa)
  • 2692f30 Release inmemory input pages incrementally (Andrii Rosa)
  • a31785b Support compression for PartitionUpdate in Hive connector (Andrii Rosa)
  • 81e45ab Fix MetastoreContext creation in presto-iceberg module (Nikhil Collooru)
  • baea41b Allow StorageOrcFileTailSource to read DWRF stripe cache data (Sergii Druzkin)
  • 9958e19 Make ORC read tail size configurable (Sergii Druzkin)
  • 6681e91 Fix Athena Glue table compatibility issue (v-jizhang)
  • e87999e Add missing support for bucket sort order in Glue (v-jizhang)
  • b8b0e52 Upgrade to 0.254 (Chunxu Tang)
  • 97953b1 Adjust the test change of AbstractTestDistributedQueries (Chunxu Tang)
  • 1769297 Include presto-iceberg in example config (Chunxu Tang)
  • 70fd449 Remove unnecessary comments (Chunxu Tang)
  • cea7117 Add MetastoreContext in the iceberg HiveTableOperations (Chunxu Tang)
  • 656f7af Adjust the usage of Hive Metastore with MetastoreContext (Chunxu Tang)
  • 6800304 Fix review comments (Chunxu Tang)
  • 8180ce3 Upgrade Presto to 0.252 (Chunxu Tang)
  • d749f3f Refactor HdfsFileIo names (Chunxu Tang)
  • b6ba913 Fix parquet version conflict (Chunxu Tang)
  • ddb67db Include iceberg connector in the root POM (Chunxu Tang)
  • 2d37a11 Set up query tests (Chunxu Tang)
  • f738ed3 Add iceberg connector (Chunxu Tang)
  • 28f558b upgrade parquet to 1.11.0 (Chunxu Tang)
  • a7a9812 Use username for comparison only when impersonation is enabled (Nikhil Collooru)
  • fca1f24 Add support for partition cache verification (Nikhil Collooru)
  • 8f4ef27 Add support for MaxResults on Glue Hive Metastore (v-jizhang)
  • 8be15cf Add a flag for HttpRemoteTask to avoid eager but unnecessary sendUpdate (guhanjie)
  • e9c4a30 Decode Parquet dictionary faster (Zhenxiao Luo)
  • c9571e1 Fix generics in declaration of ValueSet.copyOf (Zhenxiao Luo)
  • 6a4f0ab Benchmark Parquet dictionary to Domain conversion (Zhenxiao Luo)
  • b302af1 Add a test for BenchmarkSortedRangeSet (Zhenxiao Luo)
  • ca09cd3 Simplify Range consumption (Zhenxiao Luo)
  • d887f69 Add benchmark for SortedRangeSet's getOrderedRanges (Zhenxiao Luo)
  • 3faafe1 Benchmark SortedRangeSet methods (Zhenxiao Luo)
  • 494b7f5 Plumb DWRF stripe cache info from proto to model (Sergii Druzkin)
  • 54c3304 Add queryId to MetastoreContext (Nikhil Collooru)
  • c63a7c7 Use double quotes for all column names in SHOW CREATE TABLE (Abhisek Gautam Saikia)
  • 18f7e17 Use alias in case of predicate stitching (Rohit Jain)
  • 2e180ba Add support for partition schema evolution for Parquet (Jalpreet Singh Nanda (:imjalpreet))
  • 9e07362 Introduce TableToPartitionMapping (Jalpreet Singh Nanda (:imjalpreet))
  • 865753c Allow classes that inherit from AbstractTestQueryFramework to supply expectedQueryRunner. (Sergey Pershin)
  • d441316 Revert "Revert "Bring retry queries to the beginning of the queue"" (Mayank Garg)
  • 779afcc Add properties to update memory in TableFinishOperator (Vic Zhang)
  • 7f9e242 Catch UncheckedIOException while reading broadcast table from storage (Arjun Gupta)
  • aa59a8d Do not allocate resources within test constructor (v-jizhang)
  • f434ea1 Disable flaky test in TestJdbcClient (Rebecca Schlussel)
  • 970e67e Disable flaky test TestPrestoDriver.testQueryCancelByInterrupt (Rebecca Schlussel)
  • 44240cc Combine memory pool and query based spill strategies (Rebecca Schlussel)
  • eae5266 Remove periodic check from MemoryRevokingScheduler (Rebecca Schlussel)
  • dec3195 Support Unnest in fragment result caching (Shixuan Fan)
  • 39c66e3 Adding the Poisson distribution (Tal Galili)
  • 77989bf Add max error retry config option to glue client (v-jizhang)
  • 012e9bd Add support for Glue endpoint URL (v-jizhang)
  • 486f4e2 Define custom json serializer for DataSize and Duration (Abhisek Gautam Saikia)
  • 54bde4c Fix iterator in IndexedPriorityQueue (Tim Meehan)
  • 88a14ba Include queuing time while computing query completion deadline (Arjun Gupta)
  • cdc9dca Release note for 0.251.1 (Ke Wang)
  • 150fb77 Revert "Bring retry queries to the beginning of the queue" (Bhavani Hari)
  • 408c2a1 Add documentation for Glue Catalog support in Hive (v-jizhang)
  • 7e4fe3d Add intelligent tiering storage class (v-jizhang)
  • 3d78f13 Add cluster-wide endpoints for query and cluster information (Tim Meehan)
  • f6611cb Support Dwrf Sequence Ids in Writer (Akhil Umesh Mehendale)
  • 54a7ec7 Fix typo in connectors.rst (linjunhua)
  • 2999330 Short-circuit TupleDomain columnWiseUnion and intersect (Zhenxiao Luo)

@sujay-jain sujay-jain force-pushed the release-notes-0.254 branch from e7c12a3 to 4361a9e Compare May 21, 2021 23:38
@sujay-jain sujay-jain requested a review from ajaygeorge May 21, 2021 23:41
* Add documentation for Glue Catalog support in Hive. :doc:`/connector/hive`.
* Add iceberg connector.
* Add support to fragment result caching for unnest.
* Add ``poisson_cdf`` and ``inverse_poisson_cdf`` functions.
Copy link
Contributor

@ajaygeorge ajaygeorge May 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functions need to be written as :func:poisson_cdf
check e2e4751#diff-9652f5c5a99ec7ab2293ce1dfbee5491f95d8b02ed72259cbac24f0dfbec565fR13

@sujay-jain sujay-jain force-pushed the release-notes-0.254 branch 2 times, most recently from cc183dd to 6106301 Compare May 22, 2021 00:00
General Changes
_______________
* Fix a bug where queries that have both remote functions and a local function with only constant arguments could throw an IndexOutOfBoundException during planning. The bug was introduced in release 0.253 by :pr:`16039`.
* Add documentation for Glue Catalog support in Hive. :doc:`/connector/hive`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a release note for new documentation. but if we wanted to keep it, it would go in the hive section.

* Add memory tracking in ``TableFinishOperator`` which can be enabled by setting the ``table-finish-operator-memory-tracking-enabled`` configuration property to ``true``.
* Remove spilling strategy ``PER_QUERY_MEMORY_LIMIT`` and instead add configuration property ``experimental.query-limit-spill-enabled`` and session property ``query_limit_spill_enabled``. When this property is set to ``true``, and the spill strategy is not ``PER_TASK_MEMORY_THRESHOLD``, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory, in addition to whenever the memory pool exceeds the spill threshold. This fixes an issue where using the ``PER_QUERY_MEMORY_LIMIT`` spilling strategy could prevent the oom killer from running when the memory pool was full. The issue is still present for the ``PER_TASK_MEMORY_THRESHOLD`` spilling strategy.

Hive Connector Changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should all be part of the Hive Changes section

_______________
* Fix a bug where queries that have both remote functions and a local function with only constant arguments could throw an IndexOutOfBoundException during planning. The bug was introduced in release 0.253 by :pr:`16039`.
* Add documentation for Glue Catalog support in Hive. :doc:`/connector/hive`.
* Add iceberg connector.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe add a new section called "Iceberg Changes", similar to other connectors. and then have the release note "Add new Iceberg Connector", with a link to the documentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this since there's no documentation added

* Fix a bug where queries that have both remote functions and a local function with only constant arguments could throw an IndexOutOfBoundException during planning. The bug was introduced in release 0.253 by :pr:`16039`.
* Add documentation for Glue Catalog support in Hive. :doc:`/connector/hive`.
* Add iceberg connector.
* Add support to fragment result caching for unnest.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shixuan-fan could you provide more details

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "Add fragment result caching support for UNNEST queries"?


Hive Changes
____________
* Add support for MaxResults on Glue Hive Metastore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a configuration property? what is MaxResults and why is it one word?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@v-jizhang can you help add more details around MaxResults here. I don't see a configuration property.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sujay-jain @rschlussel There is no configuration property for MaxResults. Set it according to the page https://docs.aws.amazon.com/glue/latest/webapi/API_GetPartitions.html

Hive Changes
____________
* Add support for MaxResults on Glue Hive Metastore.
* Add support for bucket sort order in Glue when creating or updating a table or partition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "bucket sort order" and how does the user use this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@v-jizhang @highker could you provide more details here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a back port from Trino to fix a bug, trinodb/trino#2450

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rschlussel the release notes match those of Trino, and it's a backport. Do you think we should keep this as is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @v-jizhang. looked at that PR. We should make our release note clearer. Something like.

Fix a bug where the files would not be sorted when inserting into bucketed sorted tables with Glue

Also, since it's a bug fix, should go first in the section

* Add support for MaxResults on Glue Hive Metastore.
* Add support for bucket sort order in Glue when creating or updating a table or partition.
* Add support for partition cache validation. This can be enabled by setting ``hive.partition-cache-validation-percentage`` configuration parameter.
* Add support for partition schema evolution for parquet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add more detail in the release note here about what we mean by schema evolution? Does it mean you can query tables that have had columns added/deleted? column types changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imjalpreet @zhenxiao - could you help provide more details

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support add/delete/replace columns in partition schema
@imjalpreet could you please add more details?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can write something like:

Add support for allowing to match columns between table and partition schemas by names when the configuration property ``hive.parquet.use-column-names`` or the hive catalog session property ``parquet_use_column_names`` is set to true. By default they are mapped by index. 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to add more:

This allows schema evolution between table and partition in addition to schema evolution between the table/partition and file. Columns can be re-ordered, added or dropped between partition and table schemas.

____________
* Add support for MaxResults on Glue Hive Metastore.
* Add support for bucket sort order in Glue when creating or updating a table or partition.
* Add support for partition cache validation. This can be enabled by setting ``hive.partition-cache-validation-percentage`` configuration parameter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's partition cache validation? Also, what's the default value? Is it 0, which would I guess mean no validation or is it something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NikhilCollooru can you provide more details here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partition cache validation means we validate the value returned from partition cache with the actual value from Metastore. Yes default value is 0.0 meaning no validation. If we set it to 50.0, then it means 50% of the get partitions calls will be validated.

Presto On Spark Changes
_______________________
* Improve commit memory footprint on the Driver.
* Add session property ``spark_memory_revoking_threshold`` and configuration property ``spark.memory-revoking-threshold``, spilling is triggered when total memory is beyond this threshold.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should be a period and not a comma before "spilling is triggered"

@sujay-jain sujay-jain force-pushed the release-notes-0.254 branch 3 times, most recently from 16f7c2d to c645556 Compare May 25, 2021 21:56
@sujay-jain sujay-jain force-pushed the release-notes-0.254 branch 2 times, most recently from cd8a8c2 to 1186bd0 Compare May 26, 2021 01:37
Hive Changes
____________
* Add support for MaxResults on Glue Hive Metastore.
* Add support for bucket sort order in Glue when creating or updating a table or partition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @v-jizhang. looked at that PR. We should make our release note clearer. Something like.

Fix a bug where the files would not be sorted when inserting into bucketed sorted tables with Glue

Also, since it's a bug fix, should go first in the section


Hive Changes
____________
* Add support for MaxResults on Glue Hive Metastore. MaxResults is the maximum number of partitions to return in a single response.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@v-jizhang I'm still very confused about this release note. It seems like what the PR (#16012) actually does is set the batch size for how many partitions we fetch at once in each API call. But @pettyjamesm was skeptical about whether it even works? My instinct is this doesn't need a release note, but if it does it should be something like Improve efficiency of getting partitions from the Glue Metastore by batching requests. Would that be accurate? @aweisberg @pettyjamesm what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree. I don't think this is turning on batching the requests it's just limiting the batch size.
Improve efficiency of partition fe5tching from Glue by settting GetPartitions MaxResults

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going with removing the release note.

* Add support for bucket sort order in Glue when creating or updating a table or partition.
* Add support for partition cache validation. This can be enabled by setting the ``hive.partition-cache-validation-percentage`` configuration property. Partition cache validation allows us to validate the value returned from partition cache with the actual value from Metastore. The default value is 0.0 which means there is no validation.
* Add support for partition schema evolution for parquet.
* Add support for Glue endpoint URL :doc:`/connector/hive`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add support for configuring the Glue endpoint URL :doc:`/connector/hive`. 

* Add support for Glue endpoint URL :doc:`/connector/hive`.
* Add support for the S3 Intelligent-Tiering storage class for writing data. The S3 storage class can be configured using the configuration property ``hive.s3.storage-class``. Supported values are ``STANDARD`` and ``INTELLIGENT_TIERING``, and the default value is ``STANDARD``. :doc:`/connector/hive`.
* Add configuration property ``hive.metastore.glue.max-error-retries`` for the maximum number of retries for glue client connections. The default value is 10. :doc:`/connector/hive`.
* Allow accessing tables in Glue metastore that do not have a table type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe start this "Add support for" instead of "Allow", so it matches the other release notes.

@sujay-jain sujay-jain force-pushed the release-notes-0.254 branch from 1186bd0 to 2629bcf Compare May 27, 2021 00:39
@sujay-jain sujay-jain requested a review from rschlussel May 27, 2021 00:39
@sujay-jain sujay-jain force-pushed the release-notes-0.254 branch 2 times, most recently from 319cc16 to dfce5cd Compare May 27, 2021 00:49
@rschlussel
Copy link
Contributor

Looks good, but we'll need to add a release note for the regression fix for maps that's going to be added to the 0.254 branch, so holding off on merging for now.

Copy link
Contributor

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you copy the release note from #16073 - this was added after the cut and hence not picked up by the script that created the initial PR


General Changes
_______________
* Fix a bug where queries that have both remote functions and a local function with only constant arguments could throw an IndexOutOfBoundException during planning. The bug was introduced in release 0.253 by :pr:`16039`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexOutOfBoundsException in double back quotes

* Add fragment result caching support for ``UNNEST`` queries.
* Add :func:`poisson_cdf` and :func:`inverse_poisson_cdf` functions.
* Add memory tracking in ``TableFinishOperator`` which can be enabled by setting the ``table-finish-operator-memory-tracking-enabled`` configuration property to ``true``. Enabling this property can help investigating GC issues on the coordinator, by allowing us to debug whether stats collection uses a lot of memory.
* Remove spilling strategy ``PER_QUERY_MEMORY_LIMIT`` and instead add configuration property ``experimental.query-limit-spill-enabled`` and session property ``query_limit_spill_enabled``. When ``query_limit_spill_enabled`` is set to ``true`` and the spill strategy is not ``PER_TASK_MEMORY_THRESHOLD``, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory. We will also still spill whenever the memory pool exceeds the spill threshold. This fixes an issue where using the ``PER_QUERY_MEMORY_LIMIT`` spilling strategy could prevent the oom killer from running when the memory pool was full. The issue is still present for the ``PER_TASK_MEMORY_THRESHOLD`` spilling strategy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too big for a release note.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rschlussel - can we not try to get this within 200-250 chars at max - it is fine to provide people a link to the PR for further clarification

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how's this? it's 390 characters, but mostly because the property names are long. I don't think it can be shorter and still convey the relevant information.

Remove spilling strategy ``PER_QUERY_MEMORY_LIMIT`` and add configuration property ``experimental.query-limit-spill-enabled`` and session property ``query_limit_spill_enabled``. When set to ``true`` and the spill strategy is not ``PER_TASK_MEMORY_THRESHOLD``, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory.

Hive Changes
____________
* Fix a bug where the files would not be sorted when inserting into bucketed sorted tables with Glue.
* Add support for partition cache validation. This can be enabled by setting the ``hive.partition-cache-validation-percentage`` configuration property. Partition cache validation allows us to validate the value returned from partition cache with the actual value from Metastore. The default value is 0.0 which means there is no validation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add support to validate the values returned from the partition cache with the actual value from Metastore. This can be enabled by setting the configuration property ``hive......``

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the default value note?

____________
* Fix a bug where the files would not be sorted when inserting into bucketed sorted tables with Glue.
* Add support for partition cache validation. This can be enabled by setting the ``hive.partition-cache-validation-percentage`` configuration property. Partition cache validation allows us to validate the value returned from partition cache with the actual value from Metastore. The default value is 0.0 which means there is no validation.
* Add support for allowing to match columns between table and partition schemas by names when the configuration property ``hive.parquet.use-column-names`` or the hive catalog session property ``parquet_use_column_names`` is set to true. By default they are mapped by index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true in double back quotes

* Fix a bug where the files would not be sorted when inserting into bucketed sorted tables with Glue.
* Add support for partition cache validation. This can be enabled by setting the ``hive.partition-cache-validation-percentage`` configuration property. Partition cache validation allows us to validate the value returned from partition cache with the actual value from Metastore. The default value is 0.0 which means there is no validation.
* Add support for allowing to match columns between table and partition schemas by names when the configuration property ``hive.parquet.use-column-names`` or the hive catalog session property ``parquet_use_column_names`` is set to true. By default they are mapped by index.
* Add support for configuring the Glue endpoint URL :doc:`/connector/hive`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Period after URL

Add support for configuring the Glue endpoint URL. See :doc:`/connector/hive`.

* Add support for allowing to match columns between table and partition schemas by names when the configuration property ``hive.parquet.use-column-names`` or the hive catalog session property ``parquet_use_column_names`` is set to true. By default they are mapped by index.
* Add support for configuring the Glue endpoint URL :doc:`/connector/hive`.
* Add support for accessing tables in Glue metastore that do not have a table type.
* Add support for the S3 Intelligent-Tiering storage class for writing data. The S3 storage class can be configured using the configuration property ``hive.s3.storage-class``. Supported values are ``STANDARD`` and ``INTELLIGENT_TIERING``, and the default value is ``STANDARD``. :doc:`/connector/hive`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to mention the default value since that would already be there.

Add support for the S3 Intelligent-Tiering storage class writing data. This can be enabled by setting the configuration property ``hive.s3.storage-class`` to ``INTELLIGENT_TIERING``.


Presto On Spark Changes
_______________________
* Improve commit memory footprint on the Driver.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize Driver commit memory footprint.

@sujay-jain sujay-jain force-pushed the release-notes-0.254 branch 2 times, most recently from 34d4ac0 to 8fdee7b Compare May 27, 2021 23:47
@sujay-jain sujay-jain requested a review from mayankgarg1990 May 27, 2021 23:47
@sujay-jain sujay-jain force-pushed the release-notes-0.254 branch 2 times, most recently from c3426e4 to 503e9fe Compare June 1, 2021 13:59
_______________
* Fix a bug where queries that have both remote functions and a local function with only constant arguments could throw an ``IndexOutOfBoundException`` during planning. The bug was introduced in release 0.253 by :pr:`16039`.
* Fix a regression in cpu time introduced in 0.253 for queries using :func:`element_at` for maps.
* Revert a change introduced in 0.253 to store dictionary elements in Segmented Slice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too low level. What's the user implication?

* Revert a change introduced in 0.253 to store dictionary elements in Segmented Slice.
* Add fragment result caching support for ``UNNEST`` queries.
* Add :func:`poisson_cdf` and :func:`inverse_poisson_cdf` functions.
* Add memory tracking in ``TableFinishOperator`` which can be enabled by setting the ``table-finish-operator-memory-tracking-enabled`` configuration property to ``true``. Enabling this property can help investigating GC issues on the coordinator, by allowing us to debug whether stats collection uses a lot of memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can help with investigating GC issues
nit: no comma between "coordinator" and "by"

Hive Changes
____________
* Fix a bug where the files would not be sorted when inserting into bucketed sorted tables with Glue.
* Add support to validate the values returned from the partition cache with the actual value from Metastore. This can be enabled by setting the configuration property ``hive.partition-cache-validation-percentage``. The default value is 0.0 which means there is no validation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add support to validate -> Add support for validating

@sujay-jain sujay-jain force-pushed the release-notes-0.254 branch from 503e9fe to 1a0848b Compare June 1, 2021 14:50
@sujay-jain sujay-jain requested a review from rschlussel June 1, 2021 14:58
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'll merge once @mayankgarg1990 approves.

Copy link
Contributor

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine overall. Some changes before this can be merged


General Changes
_______________
* Fix a bug where queries that have both remote functions and a local function with only constant arguments could throw an ``IndexOutOfBoundException`` during planning. The bug was introduced in release 0.253 by :pr:`16039`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shorten the second sentence to

Introduced by :pr:`16039`.

finding which release is trivial

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has not been taken care of

General Changes
_______________
* Fix a bug where queries that have both remote functions and a local function with only constant arguments could throw an ``IndexOutOfBoundException`` during planning. The bug was introduced in release 0.253 by :pr:`16039`.
* Fix a regression in cpu time introduced in 0.253 for queries using :func:`element_at` for maps.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets follow a similar syntax for reporting regressions.

Fix a CPU regression for queries using `element_at` for ``MAP``. Introduced by :pr:`16027`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:func:element_at

Hive Changes
____________
* Fix a bug where the files would not be sorted when inserting into bucketed sorted tables with Glue.
* Add support for validating the values returned from the partition cache with the actual value from Metastore. This can be enabled by setting the configuration property ``hive.partition-cache-validation-percentage``. The default value is 0.0 which means there is no validation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: the second line - the default value is 0.0 - your previous sentence says "This can be enabled by " - that indicates that this is disabled by default. Had it been enabled by default - the sentence would have read - "This can be disabled by" - and hence it is redundant and can be removed.

@sujay-jain sujay-jain force-pushed the release-notes-0.254 branch from 1a0848b to 4b07bfc Compare June 1, 2021 17:32
@rschlussel rschlussel merged commit 665b1cb into prestodb:master Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.