Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(parquet): Use velox parquet reader in Metadatatest #11472

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jkhaliqi
Copy link
Contributor

@jkhaliqi jkhaliqi commented Nov 7, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 7, 2024
Copy link

netlify bot commented Nov 7, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 5debd1a
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67a2a59d342575000817f2cd

@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch 3 times, most recently from 4af254d to c97061c Compare November 7, 2024 22:34
Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkhaliqi can you move the file from arrow tests to here so that we can only see the Velox specific changes?

velox/dwio/parquet/reader/ParquetReader.cpp Outdated Show resolved Hide resolved
velox/dwio/parquet/reader/ParquetReader.cpp Outdated Show resolved Hide resolved
@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch 4 times, most recently from d62718d to 495598e Compare November 8, 2024 23:51
@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch 6 times, most recently from 8aa9724 to 85013aa Compare November 15, 2024 19:51
@jkhaliqi jkhaliqi changed the title refactor arrow reader to parquet reader in MetadataTest.cpp refactor(parquet): Use Velox Parquet Reader in the Parquet Writer tests Nov 15, 2024
@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch 10 times, most recently from 287d618 to d5f7423 Compare November 20, 2024 20:07
@jkhaliqi jkhaliqi changed the title refactor(parquet): Use Velox Parquet Reader in the Parquet Writer tests refactor(parquet): Use velox parquet reader in StatisticsTest and Metadatatest Nov 20, 2024
@jkhaliqi jkhaliqi marked this pull request as ready for review November 20, 2024 20:08
@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch from d5f7423 to ece201e Compare November 26, 2024 16:37
@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch 2 times, most recently from abc8f1d to c862477 Compare December 23, 2024 23:32
@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch 7 times, most recently from 4642cc1 to 38f48d1 Compare January 24, 2025 18:08
Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the usage of ParquetFileWriter which is from the Arrow library. We should be able to use the Velox writer for testing.
Can we move the changes in StaticsticsTest.cpp to a followup PR? That will make the review easy and minimal. Thanks.

@@ -18,6 +18,7 @@

#include "velox/dwio/common/Statistics.h"
#include "velox/dwio/common/compression/Compression.h"
#include "velox/dwio/parquet/thrift/ParquetThriftTypes.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this.

#include "velox/dwio/parquet/reader/ParquetReader.h"
#include "velox/exec/tests/utils/TempFilePath.h"

#include <arrow/io/api.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not need this #include "arrow/util/key_value_metadata.h" above as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the above #include "arrow/util/key_value_metadata.h" is used for a test with the arrow writer. Since this PR is focusing on refactoring the arrow reader to parquet reader (issue). Should it be fine to leave it for now, and change it when a refactor for the arrow writer to velox writer happens?

@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch 2 times, most recently from f725aa9 to dde2063 Compare January 28, 2025 00:50
@jkhaliqi jkhaliqi changed the title refactor(parquet): Use velox parquet reader in StatisticsTest and Metadatatest refactor(parquet): Use velox parquet reader in Metadatatest Jan 28, 2025
@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch from dde2063 to 5debd1a Compare February 4, 2025 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants