-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Make parquet reader options Comet options instead of Hadoop options #968
Conversation
@@ -615,12 +617,12 @@ public void testColumnIndexReadWrite() throws Exception { | |||
@Test | |||
public void testWriteReadMergeScanRange() throws Throwable { | |||
Configuration conf = new Configuration(); | |||
conf.set(ReadOptions.COMET_IO_MERGE_RANGES, Boolean.toString(true)); | |||
conf.set(CometConf.COMET_IO_MERGE_RANGES().key(), Boolean.toString(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is setting the config values in a Hadoop Configuration still, and not setting them in the Spark config. Would it make sense to update the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no spark context in this test. I've added a new test with the configuration set thru the spark config
@@ -173,14 +147,24 @@ public Builder(Configuration conf) { | |||
this.conf = conf; | |||
this.parallelIOEnabled = | |||
conf.getBoolean( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is reading from the Hadoop conf. If I set the new configs on my Spark context, how would they get propagated to the Hadoop conf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark sql copies over configs that are not from spark into the hadoop config when the sql context is created. There are other settings that also use this ( e.g. COMET_USE_LAZY_MATERIALIZATION
, COMET_SCAN_PREFETCH_ENABLED
)
withSQLConf( | ||
CometConf.COMET_BATCH_SIZE.key -> batchSize.toString, | ||
CometConf.COMET_IO_MERGE_RANGES.key -> "true", | ||
CometConf.COMET_IO_MERGE_RANGES_DELTA.key -> mergeRangeDelta.toString) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to test that this config is actually making into ReadOptions? If I comment out all of the code in ReadOptions that reads these configs, the test still passes. Perhaps we just need a specific test to show that setting the config on a Spark context causes a change to the ReadOptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some debug logging and I do see that it is working correctly, but would be good to have a test to confirm (and prevent regressions)
test is setting COMET_IO_MERGE_RANGES_DELTA = 1048576
ReadOptions ioMergeRangesDelta = 1048576
test is setting COMET_IO_MERGE_RANGES_DELTA = 1024
ReadOptions ioMergeRangesDelta = 1024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an additional check for the config. The configuration passed in to read options is not accessible but I tried to simulate the next best thing. See if that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Parth. LGTM. Looks like you need to run make format
to fix some import ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andygrove. Fixed style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some comments on testing.
Which issue does this PR close?
Closes #876 .
Rationale for this change
Moves configuration options of the parallel reader into CometConf and adds documentation so that end users can easily understand the configuration of the reader.
How are these changes tested?
Existing unit test