Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid redundant parsing of repeated value in RleDecoder #6834

Merged
merged 1 commit into from
Dec 5, 2024

Conversation

jp0317
Copy link
Contributor

@jp0317 jp0317 commented Dec 4, 2024

Which issue does this PR close?

No issue

Rationale for this change

performance improvement

What changes are included in this PR?

Moving the parsing of repeated_value out of the loop

Are there any user-facing changes?

Not really: although it adds Clone restriction to the get_batch<T>(), the T in practice always implements Clone.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Dec 4, 2024
@tustvold
Copy link
Contributor

tustvold commented Dec 4, 2024

Have you run any benchmarks for this?

@tustvold tustvold added the api-change Changes to the arrow API label Dec 4, 2024
@jp0317
Copy link
Contributor Author

jp0317 commented Dec 5, 2024

Have you run any benchmarks for this?

not really. I was doing some experiments on decoding dictionary indices and there's noticeable portion of cpu time spent on parsing this repeated value. And it seems obvious that the repeated value can be parsed outside the loop, avoiding the redundant parsing. With the change the cpu time spent inside this get_batch got reduced in my experiments.

@tustvold tustvold merged commit fa6d5e1 into apache:main Dec 5, 2024
16 checks passed
@jp0317 jp0317 deleted the rle branch December 19, 2024 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants