Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Dwrf Sequence Ids in Writer #16037

Merged
merged 1 commit into from
May 12, 2021
Merged

Conversation

akhilum
Copy link

@akhilum akhilum commented May 3, 2021

Sequence Ids are used by Dwrf FlatMap implementations. This commit introduces sequence for DWRF writer.
Following PR of flat map implementation will use this sequence id to add multiple streams per column.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented May 3, 2021

CLA Signed

The committers are authorized under a signed CLA.

  • ✅ AKHIL UMESH MEHENDALE (64fb997f06bd5317e9fe47d973376e1633db2998)

Copy link

@arunthirupathi arunthirupathi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments, but the change looks good once they are addressed. I will approve them once they are addressed.

@arunthirupathi
Copy link

Can you please rebase on the latest master ? This change will have lot of merge conflicts with, 30cd883

@akhilum akhilum force-pushed the Flat-Map-PR-1 branch 2 times, most recently from 0867ec0 to ffb9aca Compare May 5, 2021 02:27
@akhilum akhilum force-pushed the Flat-Map-PR-1 branch 2 times, most recently from fcf3ba4 to f79a5d0 Compare May 5, 2021 21:04
Copy link

@arunthirupathi arunthirupathi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good once both the comments are addressed.

@akhilum akhilum changed the title FlatMap 1 : Pass in sequence ids in writer FlatMap 1 : Support Dwrf Sequence Ids in Writer May 5, 2021
@akhilum akhilum force-pushed the Flat-Map-PR-1 branch 2 times, most recently from ee7515e to 1ce3d07 Compare May 5, 2021 22:33
@@ -283,6 +283,7 @@ public int writeStripeFooter(SliceOutput output, StripeFooter footer)

private static OrcProto.Stream toStream(Stream stream)
{
checkArgument(stream.getSequence() == ColumnEncoding.DEFAULT_SEQUENCE_ID, "Writing streams with non-zero sequence IDs is not supported in ORC " + stream);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • stream will always call stream.toString() which will create garbage. Remove the + stream or use format specifier.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stream has custom toString already implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use format instead of concat

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will print the right details (some readable string, instead of some memory address), that is not a concern. Calling toString will create a temporary string and lot of string concatenation. We want to avoid that.

two ways to avoid it are

  1. Do not include the stream in the error message. less perfereable, as we will not see what stream failed.
  2. call checkArgument(condition, "some message {}", object). Here object.toString() will be only called condition is false and avoiding the garbage on the hot path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also fixed the checkArgument in toColumnEncoding() to use format instead of concat

@akhilum akhilum changed the title FlatMap 1 : Support Dwrf Sequence Ids in Writer Support Dwrf Sequence Ids in Writer May 6, 2021
@akhilum akhilum marked this pull request as ready for review May 6, 2021 17:18
@bhhari bhhari requested a review from rschlussel May 6, 2021 18:10
Copy link
Contributor

@bhhari bhhari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

@arunthirupathi arunthirupathi self-requested a review May 6, 2021 18:34
Sequence Ids are used by Dwrf FlatMap implementations. This commit introduces sequence for DWRF writer.
Following PR of flat map implementation will use this sequence id to add multiple streams per column.
@rschlussel rschlussel merged commit f6611cb into prestodb:master May 12, 2021
@sujay-jain sujay-jain mentioned this pull request May 21, 2021
10 tasks
@jainxrohit jainxrohit mentioned this pull request Jun 11, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants